arxiv: 2604.04442 · v1 · submitted 2026-04-06 · 💻 cs.CR · cs.LG· cs.MA

Recognition: no theorem link

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Yiyao Zhang , Diksha Goel , Hussain Ahmad

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:12 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.MA

keywords causal modelingmulti-agent reinforcement learningautonomous cyber defensestructural causal modelexplainable AIfalse positive reductionadversarial policiesIoT security

0 comments

The pith

C-MADF learns a causal graph from telemetry to constrain an adversarial RL system, cutting false positives in autonomous cyber defense to 1.8% on real IoT data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents C-MADF, a framework that first extracts a Structural Causal Model from historical network telemetry and turns it into a directed acyclic graph of permitted response actions. This graph becomes the action space of a Markov decision process solved by two opposing reinforcement-learning agents: one that pushes for decisive threat responses and another that enforces conservative limits. Disagreement between the agents produces an explainability score that flags uncertain decisions for human review. The authors show that this causal-plus-adversarial design sharply reduces overreactions that plague correlation-based defenses when inputs are ambiguous or perturbed.

Core claim

C-MADF learns a Structural Causal Model from historical telemetry, compiles it into an investigation-level DAG that restricts admissible transitions, and solves the resulting constrained MDP with a dual-agent RL system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Policy divergence is quantified by a Policy Divergence Score and surfaced through an Explainability-Transparency Score. On the CICIoT2023 dataset the system reports 0.997 precision, 0.961 recall, 0.979 F1-score and a false-positive rate of 1.8 percent, down from 11.2 percent, 9.7 percent and 8.4 percent in three literature baselines.

What carries the argument

The Causal Multi-Agent Decision Framework (C-MADF) that compiles a learned Structural Causal Model into a causally restricted MDP and solves it with counterbalanced Blue- and Red-Team reinforcement-learning policies.

If this is right

Response actions remain confined to transitions that respect the learned causal structure, limiting the space of possible overreactions.
Inter-policy disagreement supplies an explicit, numeric signal that can trigger human escalation before an action is taken.
Performance gains appear on a real-world IoT dataset against three published baselines, indicating measurable reduction in false alarms.
The explainability score attaches directly to each decision, providing a transparent record of why a given response was chosen or withheld.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the causal graph can be refreshed from recent telemetry without retraining the entire policy, the framework could track slowly drifting threat landscapes.
The same constrained dual-agent pattern might transfer to other safety-critical control settings where actions must be both effective and causally justifiable.
Quantitative comparison of Policy Divergence Scores across threat categories could reveal which attack types produce the greatest policy uncertainty.

Load-bearing premise

The structural causal model extracted from past telemetry correctly reflects the true causal structure of live network environments, so that the compiled DAG includes every safe response path and excludes every unsafe one.

What would settle it

A controlled live test in which the system either blocks a legitimate response path or permits an unsafe action that the historical SCM did not anticipate would show that the learned causal graph fails to cover the true environment.

Figures

Figures reproduced from arXiv: 2604.04442 by Diksha Goel, Hussain Ahmad, Yiyao Zhang.

**Figure 1.** Figure 1: Illustration of a Shadow-Jitter telemetry manipulation scenario. Controlled perturbations in host and network logs distort apparent event correlations, [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: The architecture of the Causal Multi-Agent Decision Framework (C-MADF). The process begins with the Causal Discovery Module learning a causal [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: The MDP-DAG Investigation Roadmap showing valid state transitions (orange arrows with reward values) and causally inconsistent blocked transitions [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Council of Rivals adversarial deliberation architecture. At each investigation state [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrative decomposition of the ETS into its three primary components, Clarity, Completeness, and Confidence, and their respective sub-metrics. The [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of the Shadow-Jitter attack scenario. Adversarial timing [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Defensive response under Shadow-Jitter injection: C-MADF maintains high detection with low false positives, structured human review, and strong [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

read the original abstract

Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The performance numbers are strong but the causal DAG has no validation or ablation, so it's unclear if the SCM restrictions are doing any real work.

read the letter

The main thing to know is that C-MADF reports a big drop in false positives on CICIoT2023, but without any test of the learned SCM or an ablation that removes the DAG constraint, those gains cannot be credited to the causal component rather than the RL setup itself. The abstract frames the work as a new structurally constrained architecture, yet the evidence for the structure actually mattering is missing from the provided text. What stands out as new is the concrete assembly of an SCM-derived DAG to restrict MDP actions, paired with adversarial blue and red policies and a Policy Divergence Score for explainability. That specific combination is not a routine extension of the baselines cited. The paper does a reasonable job laying out the problem of ambiguous telemetry and over-reaction in autonomous defenses, and the dual-policy balance with a conservative red team is a straightforward way to add caution. The explainability interface tied to inter-policy disagreement is also a practical touch for a domain where operators need escalation signals. The soft spots are concentrated in the causal claims. There is no domain-expert validation of the discovered DAG, no interventional check on whether the SCM recovered true edges instead of spurious correlations from the data, and no comparison against a non-causal multi-agent RL baseline on the same split. The fact that the SCM is learned from the same historical telemetry used for training and evaluation adds circularity risk. This paper is aimed at researchers working on RL for network defense and on injecting structural constraints into autonomous agents. A reader focused on explainable decision systems in security would find the architecture and divergence metric worth examining. It deserves a serious referee because the core idea is coherent and the application area is high-stakes, even with the current empirical gaps. I would recommend sending it to review with requests for SCM validation, constraint ablations, and out-of-distribution testing.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Causal Multi-Agent Decision Framework (C-MADF) for autonomous cyber defense. It learns a Structural Causal Model (SCM) from historical telemetry, compiles it into a DAG to restrict the action space of an MDP, employs adversarial multi-agent RL with blue and red team policies, and uses policy divergence for explainability. On the CICIoT2023 dataset, it reports reducing FPR to 1.8% with high precision, recall, and F1 scores compared to baselines.

Significance. If the reported performance gains are attributable to the causal constraints rather than unconstrained RL or data fitting, this work could provide a valuable approach to incorporating structural causality into RL-based cyber defense systems, enhancing both performance and explainability in high-stakes environments.

major comments (3)

Abstract: The headline performance claim (FPR reduced to 1.8%, F1=0.979 on CICIoT2023) attributes gains to the SCM-derived DAG constraints on the MDP action space, but the text provides no validation of the learned SCM (e.g., interventional tests or domain-expert review), no ablation removing the DAG restriction, and no confirmation that action restrictions were enforced during RL training. This leaves the central causal contribution unverifiable.
Abstract: The evaluation trains RL policies on the same CICIoT2023 dataset used for reporting metrics and learns the SCM from historical telemetry; without an explicit train/test split or out-of-distribution test, the numbers risk circularity and cannot be credited to the causal component over standard RL fitting.
Abstract: No comparison is described against a non-causal multi-agent RL baseline trained and evaluated on the identical data split, which is required to isolate whether the DAG constraints (rather than the dual-policy architecture or other implementation details) drive the reported FPR reduction from 11.2%/9.7%/8.4% to 1.8%.

minor comments (2)

Abstract: The three literature baselines are not named or cited, preventing direct assessment of the comparison.
Abstract: The Explainability-Transparency Score and Policy Divergence Score are introduced without definitions or formulas, leaving their computation and use as escalation signals unclear.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where the causal contributions of C-MADF require stronger empirical support. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and experiments.

read point-by-point responses

Referee: Abstract: The headline performance claim (FPR reduced to 1.8%, F1=0.979 on CICIoT2023) attributes gains to the SCM-derived DAG constraints on the MDP action space, but the text provides no validation of the learned SCM (e.g., interventional tests or domain-expert review), no ablation removing the DAG restriction, and no confirmation that action restrictions were enforced during RL training. This leaves the central causal contribution unverifiable.

Authors: We acknowledge that the manuscript does not presently include interventional validation of the SCM, domain-expert review, an ablation removing the DAG, or explicit confirmation of enforcement. In revision we will add a dedicated subsection on SCM validation (including any interventional checks feasible on the learned model), an ablation study comparing the full C-MADF against an otherwise identical version without DAG restrictions, and a clear description of how the constrained action space was implemented and enforced within the RL training loop. revision: yes
Referee: Abstract: The evaluation trains RL policies on the same CICIoT2023 dataset used for reporting metrics and learns the SCM from historical telemetry; without an explicit train/test split or out-of-distribution test, the numbers risk circularity and cannot be credited to the causal component over standard RL fitting.

Authors: The current text does not explicitly document the train/test partitioning. We will revise the evaluation section to state the precise splits used for SCM learning, policy training, and final reporting, and will add discussion of any steps taken to mitigate circularity (e.g., temporal separation of telemetry used for the SCM versus the RL episodes). If out-of-distribution testing is feasible with the available data we will include it; otherwise we will note the limitation. revision: yes
Referee: Abstract: No comparison is described against a non-causal multi-agent RL baseline trained and evaluated on the identical data split, which is required to isolate whether the DAG constraints (rather than the dual-policy architecture or other implementation details) drive the reported FPR reduction from 11.2%/9.7%/8.4% to 1.8%.

Authors: We agree that a controlled non-causal multi-agent RL baseline on the same data split is necessary to isolate the contribution of the DAG constraints. The existing literature baselines do not hold the dual-policy architecture fixed. In the revision we will implement and report results for a non-causal counterpart (identical dual-agent RL but with unrestricted action space) trained and evaluated on the identical split, thereby directly quantifying the effect of the causal restrictions. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes an empirical architecture: learning an SCM from telemetry, compiling it to a DAG to constrain an MDP, and training dual adversarial RL policies. Performance numbers are presented as measured outcomes on the CICIoT2023 dataset. No equations or steps in the abstract reduce a claimed prediction or first-principles result to the inputs by construction, nor do any self-citations supply load-bearing uniqueness theorems. The process is standard data-driven modeling followed by evaluation; the reported metrics do not constitute a renamed fit or self-referential definition.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central performance claim rests on the learned SCM faithfully representing causal structure and on the RL policies converging to useful behavior inside the resulting constrained MDP; both steps are data-driven.

free parameters (2)

RL policy parameters
Blue-team and Red-team policies are optimized via reinforcement learning on the dataset, introducing fitted parameters whose values are not reported.
SCM parameters
The structural causal model is learned from historical telemetry, so its edge weights and conditional distributions are fitted quantities.

axioms (2)

domain assumption The learned SCM is a faithful representation of the underlying causal mechanisms in the telemetry.
The admissible-action DAG is compiled directly from the SCM; any mismatch between the model and reality would invalidate the constraint set.
domain assumption The MDP formulation with restricted transitions preserves all necessary defensive options.
The paper assumes the causal DAG does not eliminate any effective response that would have been safe.

pith-pipeline@v0.9.0 · 5615 in / 1584 out tokens · 63698 ms · 2026-05-10T20:12:21.042006+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AgenticVM: Agentic AI for Adaptive Software Vulnerability Management
cs.CR 2026-05 unverdicted novelty 4.0

AgenticVM reduces vulnerability scanner alerts by up to 98% and predicts missing CVSS attributes with 89.3% accuracy using a multi-agent LLM framework integrated with security tools and public databases.
Comparative Analysis of Large Language Models in Healthcare
cs.CL 2026-04 unverdicted novelty 3.0

Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.

Reference graph

Works this paper leans on

62 extracted references · 10 canonical work pages · cited by 2 Pith papers

[1]

Enhancing network resilience through machine learning- powered graph combinatorial optimization: Applications in cyber de- fense and information diffusion,

D. Goel, “Enhancing network resilience through machine learning- powered graph combinatorial optimization: Applications in cyber de- fense and information diffusion,”arXiv preprint arXiv:2310.10667, 2023

work page arXiv 2023
[2]

Overview of smartphone security: Attack and defense techniques,

D. Goel and A. K. Jain, “Overview of smartphone security: Attack and defense techniques,” inComputer and Cyber Security. Boca Raton, FL, USA: Auerbach Publications, 2018, pp. 249–279

2018
[3]

Smart hpa: A resource-efficient horizontal pod auto-scaler for microservice archi- tectures,

H. Ahmad, C. Treude, M. Wagner, and C. Szabo, “Smart hpa: A resource-efficient horizontal pod auto-scaler for microservice archi- tectures,” in2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 2024, pp. 46–57

2024
[4]

Towards resource-efficient reactive and proactive auto-scaling for microservice architectures,

——, “Towards resource-efficient reactive and proactive auto-scaling for microservice architectures,”Journal of Systems and Software, vol. 225, p. 112390, 2025

2025
[5]

Resilient auto-scaling of microservice architectures with efficient resource management,

——, “Resilient auto-scaling of microservice architectures with efficient resource management,”arXiv preprint arXiv:2506.05693, 2025

work page arXiv 2025
[6]

Regimefolio: A regime aware ml system for sectoral portfolio optimization in dynamic markets,

Y . Zhang, D. Goel, H. Ahmad, and C. Szabo, “Regimefolio: A regime aware ml system for sectoral portfolio optimization in dynamic markets,” IEEE Access, 2025

2025
[7]

3s-trader: A multi-llm framework for adaptive stock scoring, strategy, and selection in portfolio optimization,

K. Chen, H. Ahmad, D. Goel, and C. Szabo, “3s-trader: A multi-llm framework for adaptive stock scoring, strategy, and selection in portfolio optimization,”arXiv preprint arXiv:2510.17393, 2025

work page arXiv 2025
[8]

Australian bushfire intelligence with ai-driven environmental analytics,

T. Jois, H. Ahmad, F. Noor, and F. Ullah, “Australian bushfire intelligence with ai-driven environmental analytics,”arXiv preprint arXiv:2601.06105, 2026

work page arXiv 2026
[9]

A survey of security challenges in cloud-based SCADA systems,

A. Wali and F. Alshehry, “A survey of security challenges in cloud-based SCADA systems,”Computers, vol. 13, no. 4, p. 97, 2024

2024
[10]

A survey on security issues in smart grids,

P. Jokar, N. Arianpoo, and V . C. M. Leung, “A survey on security issues in smart grids,”Security and Communication Networks, vol. 9, no. 3, pp. 262–273, 2016

2016
[11]

A review on c3i systems’ security: Vulnerabilities, attacks, and countermeasures,

H. Ahmad, I. Dharmadasa, F. Ullah, and M. A. Babar, “A review on c3i systems’ security: Vulnerabilities, attacks, and countermeasures,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–38, 2023

2023
[12]

Microservice vulnerability analysis: A literature review with empirical insights,

R. K. Jayalath, H. Ahmad, D. Goel, M. S. Syed, and F. Ullah, “Microservice vulnerability analysis: A literature review with empirical insights,”IEEE Access, vol. 12, pp. 155 168–155 204, 2024

2024
[13]

Living off the land and fileless attack techniques,

C. Wueest and H. Anand, “Living off the land and fileless attack techniques,” Symantec, Mountain View, CA, USA, Tech. Rep., 2017

2017
[14]

A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,

A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,”IEEE Communications Surveys & Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019

2019
[15]

Chatnvd: Advancing cybersecurity vulnerability assessment with large language models,

S. Chopra, H. Ahmad, D. Goel, and C. Szabo, “Chatnvd: Advancing cybersecurity vulnerability assessment with large language models,” IEEE Access, 2026

2026
[16]

Towards deep learning enabled cybersecurity risk assessment for microservice archi- tectures,

M. Abdulsatar, H. Ahmad, D. Goel, and F. Ullah, “Towards deep learning enabled cybersecurity risk assessment for microservice archi- tectures,”Cluster Computing, vol. 28, no. 6, p. 350, 2025

2025
[17]

Deep reinforcement learning for cyber security,

T. T. Nguyen and V . J. Reddi, “Deep reinforcement learning for cyber security,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 10, pp. 4535–4549, 2021

2021
[18]

Enhancing security and energy efficiency of cyber-physical systems using deep reinforcement learning,

S. Jamshidi, A. Amirnia, A. Nikanjam, and F. Khomh, “Enhancing security and energy efficiency of cyber-physical systems using deep reinforcement learning,”Procedia Computer Science, vol. 238, pp. 1074–1079, 2024

2024
[19]

Kott, Ed.,Autonomous Intelligent Cyber Defense Agent (AICA), ser

A. Kott, Ed.,Autonomous Intelligent Cyber Defense Agent (AICA), ser. Advances in Information Security. Springer, 2023

2023
[20]

Optimizing cyber defense in dynamic active directories through re- inforcement learning,

D. Goel, K. Moore, M. Guo, D. Wang, M. Kim, and S. Camtepe, “Optimizing cyber defense in dynamic active directories through re- inforcement learning,” inProceedings of the European Symposium on Research in Computer Security (ESORICS). Cham, Switzerland: Springer, 2024, pp. 332–352

2024
[21]

Forewarned is forearmed: A survey on large language model-based agents in autonomous cyberattacks,

M. Xu, J. Fan, X. Huang, C. Zhou, J. Kang, D. Niyato, S. Mao, Z. Han, and K.-Y . Lam, “Forewarned is forearmed: A survey on large language model-based agents in autonomous cyberattacks,”arXiv preprint arXiv:2505.12786, 2025

work page arXiv 2025
[22]

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–55, 2025

2025
[23]

Peeking inside the black-box: A survey on explainable artificial intelligence (xai),

A. Adadi and M. Berrada, “Peeking inside the black-box: A survey on explainable artificial intelligence (xai),”IEEE Access, vol. 6, pp. 52 138– 52 160, 2018

2018
[24]

Practical black-box attacks against machine learning,

N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 2017, pp. 506–519

2017
[25]

Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,

M. Macas, C. Wu, and W. Fuertes, “Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,” Expert Systems with Applications, vol. 238, p. 122223, 2023

2023
[26]

Anomaly de- tection in vehicular networks using causality-aware graph convolutional networks (CA-GCN),

F. Luo, C. Luo, J. Wang, Z. Li, Z. Liao, and Q. Liu, “Anomaly de- tection in vehicular networks using causality-aware graph convolutional networks (CA-GCN),”International Journal of Automotive Technology, pp. 1–16, 2025

2025
[27]

Pearl,Causality: Models, Reasoning, and Inference, 2nd ed

J. Pearl,Causality: Models, Reasoning, and Inference, 2nd ed. Cam- bridge University Press, 2009

2009
[28]

Robust partial least squares using low rank and sparse decomposition,

F. Abbas and H. Ahmad, “Robust partial least squares using low rank and sparse decomposition,”arXiv preprint arXiv:2407.06936, 2024

work page arXiv 2024
[29]

A comprehensive review of explainable AI in cybersecurity: Decoding the black box,

A. Sharma, S. Rani, and M. Shabaz, “A comprehensive review of explainable AI in cybersecurity: Decoding the black box,”ICT Express, 2025

2025
[30]

Scalar: Self-calibrating adaptive latent attention representation learning,

F. Abbas, H. Ahmad, and C. Szabo, “Scalar: Self-calibrating adaptive latent attention representation learning,” in2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2025, pp. 762–769

2025
[31]

AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inProceedings of the First Conference on Language Modeling, 2024

2024
[32]

Intelligent multi-agent collaboration model for smart home IoT security,

L. Rafferty, F. Iqbal, S. Aleem, Z. Lu, S.-C. Huang, and P. C. K. Hung, “Intelligent multi-agent collaboration model for smart home IoT security,” in2018 IEEE International Congress on Internet of Things (ICIOT), 2018, pp. 65–71

2018
[33]

Co-evolutionary defence of active directory attack graphs via gnn-approximated dynamic pro- gramming,

D. Goel, H. Ahmad, K. Moore, and M. Guo, “Co-evolutionary defence of active directory attack graphs via gnn-approximated dynamic pro- gramming,”arXiv preprint arXiv:2505.11710, 2025. 25

work page arXiv 2025
[34]

Unveiling the black box: A multi-layer framework for explaining reinforcement learning-based cyber agents,

D. Goel, K. Moore, J. Wang, M. Kim, and T. T. Nguyen, “Unveiling the black box: A multi-layer framework for explaining reinforcement learning-based cyber agents,”arXiv preprint arXiv:2505.11708, 2025

work page arXiv 2025
[35]

A survey on immersive cyber situational awareness systems,

H. Ahmad, F. Ullah, and R. Jafri, “A survey on immersive cyber situational awareness systems,”Journal of Cybersecurity and Privacy, vol. 5, no. 2, p. 33, 2025

2025
[36]

Alpcan and T

T. Alpcan and T. Ba¸ sar,Network Security: A Decision and Game- Theoretic Approach. Cambridge University Press, 2010

2010
[37]

Security and privacy for green IoT-based agriculture: Review, blockchain solu- tions, and challenges,

M. A. Ferrag, L. Shu, X. Yang, L. Derhab, and L. Maglaras, “Security and privacy for green IoT-based agriculture: Review, blockchain solu- tions, and challenges,”IEEE Access, vol. 8, pp. 32 031–32 053, 2020

2020
[38]

Network intrusion detection: An optimized deep learning approach using big data analytics,

D. Suja Mary, L. Suganthi, and A. Srisaila, “Network intrusion detection: An optimized deep learning approach using big data analytics,”Expert Systems with Applications, vol. 251, p. 123919, 2024

2024
[39]

A bidirectional LSTM deep learning approach for intrusion detection,

Y . Imrana, Y . Xiang, L. Ali, and Z. Abdul-Rauf, “A bidirectional LSTM deep learning approach for intrusion detection,”Expert Systems with Applications, vol. 185, p. 115524, 2021

2021
[40]

Dugat-LSTM: Deep learning based network intrusion detection system using chaotic optimization strategy,

R. Devendiran and A. V . Turukmane, “Dugat-LSTM: Deep learning based network intrusion detection system using chaotic optimization strategy,”Expert Systems with Applications, vol. 245, p. 123027, 2024

2024
[41]

Spirtes, C

P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search, 2nd ed. MIT Press, 2000

2000
[42]

Hybrid deep learning model using SPCAGAN augmentation for insider threat analysis,

R. G. Gayathri, A. Sajjanhar, and Y . Xiang, “Hybrid deep learning model using SPCAGAN augmentation for insider threat analysis,”Expert Systems with Applications, vol. 249, p. 123533, 2024

2024
[43]

The future of ai: Exploring the potential of large concept models,

H. Ahmad and D. Goel, “The future of ai: Exploring the potential of large concept models,”arXiv preprint arXiv:2501.05487, 2025

work page arXiv 2025
[44]

What skills do cybersecurity professionals need?

F. Ullah, X. Ye, U. Fatima, Y . Wu, Z. Akhtar, and H. Ahmad, “What skills do cybersecurity professionals need?”Information & Computer Security, pp. 1–19, 2026

2026
[45]

Intrusion detection using hybridized meta-heuristic techniques with weighted XGBoost classifier,

G. Mohiuddin, A. Alenizi, N. Saeed, and S. Alkahtani, “Intrusion detection using hybridized meta-heuristic techniques with weighted XGBoost classifier,”Expert Systems with Applications, vol. 232, p. 120596, 2023

2023
[46]

A new intrusion detection system based on moth-flame optimizer algorithm,

M. Alazab, R. M. Khan, S. Goel, K. P. Sahoo, and S. Kumar, “A new intrusion detection system based on moth-flame optimizer algorithm,” Expert Systems with Applications, vol. 210, p. 118439, 2022

2022
[47]

Goodfellow, Y

I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016

2016
[48]

Machine learning driven smishing detection framework for mobile security,

D. Goel, H. Ahmad, A. K. Jain, and N. K. Goel, “Machine learning driven smishing detection framework for mobile security,”arXiv preprint arXiv:2412.09641, 2024

work page arXiv 2024
[49]

Review of artificial intel- ligence for enhancing intrusion detection in the internet of things,

M. Saied, S. Guirguis, and M. Madbouly, “Review of artificial intel- ligence for enhancing intrusion detection in the internet of things,” Engineering Applications of Artificial Intelligence, vol. 127, p. 107231, 2024

2024
[50]

An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability,

M. A. Bouke and A. Abdullah, “An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability,”Expert Systems with Applications, vol. 230, p. 120715, 2023

2023
[51]

An improved random forest based on the classification accuracy and correlation measurement of decision trees,

Z. Sun, G. Wang, P. Li, H. Wang, M. Zhang, and X. Liang, “An improved random forest based on the classification accuracy and correlation measurement of decision trees,”Expert Systems with Applications, vol. 237, p. 121549, 2024

2024
[52]

Malware detection issues, challenges, and future directions: A survey,

F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-Rimy, T. A. E. Eisa, and A. A. H. Elnour, “Malware detection issues, challenges, and future directions: A survey,”Applied Sciences, vol. 12, no. 17, p. 8482, 2022

2022
[53]

Verified models and reference implementations for the TLS 1.3 standard candidate,

K. Bhargavan, B. Blanchet, and N. Kobeissi, “Verified models and reference implementations for the TLS 1.3 standard candidate,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 483–502

2017
[54]

Explanation in artificial intelligence: Insights from the social sciences,

A. Miller, “Explanation in artificial intelligence: Insights from the social sciences,”Artificial Intelligence, vol. 267, pp. 1–38, 2019

2019
[55]

Counterfactual explanations and algorithmic recourses for machine learning: A review,

S. Verma, V . Boonsanong, M. Hoang, K. Hines, J. Dickerson, and C. Shah, “Counterfactual explanations and algorithmic recourses for machine learning: A review,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–42, 2024

2024
[56]

Human-in-the-loop machine learn- ing: A state of the art,

E. Mosqueira-Rey, E. Hernandez-Pereira, D. Alonso-Rios, J. Bobes- Bascaran, and A. Fernandez-Leal, “Human-in-the-loop machine learn- ing: A state of the art,”Artificial Intelligence Review, vol. 56, no. 4, pp. 3005–3054, 2023

2023
[57]

Shoham and K

Y . Shoham and K. Leyton-Brown,Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008

2008
[58]

A survey and critique of multiagent deep reinforcement learning,

P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,”Autonomous Agents and Multi- Agent Systems, vol. 33, no. 6, pp. 750–797, 2019

2019
[59]

M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, 2014

2014
[60]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018

2018
[61]

E. M. Clarke, O. Grumberg, and D. A. Peled,Model Checking, 2nd ed. MIT Press, 1999

1999
[62]

CICIoT2023: A real-time dataset and benchmark for large- scale attacks in IoT environment,

E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, and A. A. Ghorbani, “CICIoT2023: A real-time dataset and benchmark for large- scale attacks in IoT environment,”Sensors, vol. 23, no. 13, p. 5941, 2023

2023