pith. machine review for the scientific record. sign in

arxiv: 2604.04442 · v1 · submitted 2026-04-06 · 💻 cs.CR · cs.LG· cs.MA

Recognition: no theorem link

Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:12 UTC · model grok-4.3

classification 💻 cs.CR cs.LGcs.MA
keywords causal modelingmulti-agent reinforcement learningautonomous cyber defensestructural causal modelexplainable AIfalse positive reductionadversarial policiesIoT security
0
0 comments X

The pith

C-MADF learns a causal graph from telemetry to constrain an adversarial RL system, cutting false positives in autonomous cyber defense to 1.8% on real IoT data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents C-MADF, a framework that first extracts a Structural Causal Model from historical network telemetry and turns it into a directed acyclic graph of permitted response actions. This graph becomes the action space of a Markov decision process solved by two opposing reinforcement-learning agents: one that pushes for decisive threat responses and another that enforces conservative limits. Disagreement between the agents produces an explainability score that flags uncertain decisions for human review. The authors show that this causal-plus-adversarial design sharply reduces overreactions that plague correlation-based defenses when inputs are ambiguous or perturbed.

Core claim

C-MADF learns a Structural Causal Model from historical telemetry, compiles it into an investigation-level DAG that restricts admissible transitions, and solves the resulting constrained MDP with a dual-agent RL system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Policy divergence is quantified by a Policy Divergence Score and surfaced through an Explainability-Transparency Score. On the CICIoT2023 dataset the system reports 0.997 precision, 0.961 recall, 0.979 F1-score and a false-positive rate of 1.8 percent, down from 11.2 percent, 9.7 percent and 8.4 percent in three literature baselines.

What carries the argument

The Causal Multi-Agent Decision Framework (C-MADF) that compiles a learned Structural Causal Model into a causally restricted MDP and solves it with counterbalanced Blue- and Red-Team reinforcement-learning policies.

If this is right

  • Response actions remain confined to transitions that respect the learned causal structure, limiting the space of possible overreactions.
  • Inter-policy disagreement supplies an explicit, numeric signal that can trigger human escalation before an action is taken.
  • Performance gains appear on a real-world IoT dataset against three published baselines, indicating measurable reduction in false alarms.
  • The explainability score attaches directly to each decision, providing a transparent record of why a given response was chosen or withheld.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the causal graph can be refreshed from recent telemetry without retraining the entire policy, the framework could track slowly drifting threat landscapes.
  • The same constrained dual-agent pattern might transfer to other safety-critical control settings where actions must be both effective and causally justifiable.
  • Quantitative comparison of Policy Divergence Scores across threat categories could reveal which attack types produce the greatest policy uncertainty.

Load-bearing premise

The structural causal model extracted from past telemetry correctly reflects the true causal structure of live network environments, so that the compiled DAG includes every safe response path and excludes every unsafe one.

What would settle it

A controlled live test in which the system either blocks a legitimate response path or permits an unsafe action that the historical SCM did not anticipate would show that the learned causal graph fails to cover the true environment.

Figures

Figures reproduced from arXiv: 2604.04442 by Diksha Goel, Hussain Ahmad, Yiyao Zhang.

Figure 1
Figure 1. Figure 1: Illustration of a Shadow-Jitter telemetry manipulation scenario. Controlled perturbations in host and network logs distort apparent event correlations, [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of the Causal Multi-Agent Decision Framework (C-MADF). The process begins with the Causal Discovery Module learning a causal [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The MDP-DAG Investigation Roadmap showing valid state transitions (orange arrows with reward values) and causally inconsistent blocked transitions [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Council of Rivals adversarial deliberation architecture. At each investigation state [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Illustrative decomposition of the ETS into its three primary components, Clarity, Completeness, and Confidence, and their respective sub-metrics. The [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of the Shadow-Jitter attack scenario. Adversarial timing [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Defensive response under Shadow-Jitter injection: C-MADF maintains high detection with low false positives, structured human review, and strong [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
read the original abstract

Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Causal Multi-Agent Decision Framework (C-MADF) for autonomous cyber defense. It learns a Structural Causal Model (SCM) from historical telemetry, compiles it into a DAG to restrict the action space of an MDP, employs adversarial multi-agent RL with blue and red team policies, and uses policy divergence for explainability. On the CICIoT2023 dataset, it reports reducing FPR to 1.8% with high precision, recall, and F1 scores compared to baselines.

Significance. If the reported performance gains are attributable to the causal constraints rather than unconstrained RL or data fitting, this work could provide a valuable approach to incorporating structural causality into RL-based cyber defense systems, enhancing both performance and explainability in high-stakes environments.

major comments (3)
  1. Abstract: The headline performance claim (FPR reduced to 1.8%, F1=0.979 on CICIoT2023) attributes gains to the SCM-derived DAG constraints on the MDP action space, but the text provides no validation of the learned SCM (e.g., interventional tests or domain-expert review), no ablation removing the DAG restriction, and no confirmation that action restrictions were enforced during RL training. This leaves the central causal contribution unverifiable.
  2. Abstract: The evaluation trains RL policies on the same CICIoT2023 dataset used for reporting metrics and learns the SCM from historical telemetry; without an explicit train/test split or out-of-distribution test, the numbers risk circularity and cannot be credited to the causal component over standard RL fitting.
  3. Abstract: No comparison is described against a non-causal multi-agent RL baseline trained and evaluated on the identical data split, which is required to isolate whether the DAG constraints (rather than the dual-policy architecture or other implementation details) drive the reported FPR reduction from 11.2%/9.7%/8.4% to 1.8%.
minor comments (2)
  1. Abstract: The three literature baselines are not named or cited, preventing direct assessment of the comparison.
  2. Abstract: The Explainability-Transparency Score and Policy Divergence Score are introduced without definitions or formulas, leaving their computation and use as escalation signals unclear.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where the causal contributions of C-MADF require stronger empirical support. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and experiments.

read point-by-point responses
  1. Referee: Abstract: The headline performance claim (FPR reduced to 1.8%, F1=0.979 on CICIoT2023) attributes gains to the SCM-derived DAG constraints on the MDP action space, but the text provides no validation of the learned SCM (e.g., interventional tests or domain-expert review), no ablation removing the DAG restriction, and no confirmation that action restrictions were enforced during RL training. This leaves the central causal contribution unverifiable.

    Authors: We acknowledge that the manuscript does not presently include interventional validation of the SCM, domain-expert review, an ablation removing the DAG, or explicit confirmation of enforcement. In revision we will add a dedicated subsection on SCM validation (including any interventional checks feasible on the learned model), an ablation study comparing the full C-MADF against an otherwise identical version without DAG restrictions, and a clear description of how the constrained action space was implemented and enforced within the RL training loop. revision: yes

  2. Referee: Abstract: The evaluation trains RL policies on the same CICIoT2023 dataset used for reporting metrics and learns the SCM from historical telemetry; without an explicit train/test split or out-of-distribution test, the numbers risk circularity and cannot be credited to the causal component over standard RL fitting.

    Authors: The current text does not explicitly document the train/test partitioning. We will revise the evaluation section to state the precise splits used for SCM learning, policy training, and final reporting, and will add discussion of any steps taken to mitigate circularity (e.g., temporal separation of telemetry used for the SCM versus the RL episodes). If out-of-distribution testing is feasible with the available data we will include it; otherwise we will note the limitation. revision: yes

  3. Referee: Abstract: No comparison is described against a non-causal multi-agent RL baseline trained and evaluated on the identical data split, which is required to isolate whether the DAG constraints (rather than the dual-policy architecture or other implementation details) drive the reported FPR reduction from 11.2%/9.7%/8.4% to 1.8%.

    Authors: We agree that a controlled non-causal multi-agent RL baseline on the same data split is necessary to isolate the contribution of the DAG constraints. The existing literature baselines do not hold the dual-policy architecture fixed. In the revision we will implement and report results for a non-causal counterpart (identical dual-agent RL but with unrestricted action space) trained and evaluated on the identical split, thereby directly quantifying the effect of the causal restrictions. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper describes an empirical architecture: learning an SCM from telemetry, compiling it to a DAG to constrain an MDP, and training dual adversarial RL policies. Performance numbers are presented as measured outcomes on the CICIoT2023 dataset. No equations or steps in the abstract reduce a claimed prediction or first-principles result to the inputs by construction, nor do any self-citations supply load-bearing uniqueness theorems. The process is standard data-driven modeling followed by evaluation; the reported metrics do not constitute a renamed fit or self-referential definition.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central performance claim rests on the learned SCM faithfully representing causal structure and on the RL policies converging to useful behavior inside the resulting constrained MDP; both steps are data-driven.

free parameters (2)
  • RL policy parameters
    Blue-team and Red-team policies are optimized via reinforcement learning on the dataset, introducing fitted parameters whose values are not reported.
  • SCM parameters
    The structural causal model is learned from historical telemetry, so its edge weights and conditional distributions are fitted quantities.
axioms (2)
  • domain assumption The learned SCM is a faithful representation of the underlying causal mechanisms in the telemetry.
    The admissible-action DAG is compiled directly from the SCM; any mismatch between the model and reality would invalidate the constraint set.
  • domain assumption The MDP formulation with restricted transitions preserves all necessary defensive options.
    The paper assumes the causal DAG does not eliminate any effective response that would have been safe.

pith-pipeline@v0.9.0 · 5615 in / 1584 out tokens · 63698 ms · 2026-05-10T20:12:21.042006+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AgenticVM: Agentic AI for Adaptive Software Vulnerability Management

    cs.CR 2026-05 unverdicted novelty 4.0

    AgenticVM reduces vulnerability scanner alerts by up to 98% and predicts missing CVSS attributes with 89.3% accuracy using a multi-agent LLM framework integrated with security tools and public databases.

  2. Comparative Analysis of Large Language Models in Healthcare

    cs.CL 2026-04 unverdicted novelty 3.0

    Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.

Reference graph

Works this paper leans on

62 extracted references · 10 canonical work pages · cited by 2 Pith papers

  1. [1]

    Enhancing network resilience through machine learning- powered graph combinatorial optimization: Applications in cyber de- fense and information diffusion,

    D. Goel, “Enhancing network resilience through machine learning- powered graph combinatorial optimization: Applications in cyber de- fense and information diffusion,”arXiv preprint arXiv:2310.10667, 2023

  2. [2]

    Overview of smartphone security: Attack and defense techniques,

    D. Goel and A. K. Jain, “Overview of smartphone security: Attack and defense techniques,” inComputer and Cyber Security. Boca Raton, FL, USA: Auerbach Publications, 2018, pp. 249–279

  3. [3]

    Smart hpa: A resource-efficient horizontal pod auto-scaler for microservice archi- tectures,

    H. Ahmad, C. Treude, M. Wagner, and C. Szabo, “Smart hpa: A resource-efficient horizontal pod auto-scaler for microservice archi- tectures,” in2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 2024, pp. 46–57

  4. [4]

    Towards resource-efficient reactive and proactive auto-scaling for microservice architectures,

    ——, “Towards resource-efficient reactive and proactive auto-scaling for microservice architectures,”Journal of Systems and Software, vol. 225, p. 112390, 2025

  5. [5]

    Resilient auto-scaling of microservice architectures with efficient resource management,

    ——, “Resilient auto-scaling of microservice architectures with efficient resource management,”arXiv preprint arXiv:2506.05693, 2025

  6. [6]

    Regimefolio: A regime aware ml system for sectoral portfolio optimization in dynamic markets,

    Y . Zhang, D. Goel, H. Ahmad, and C. Szabo, “Regimefolio: A regime aware ml system for sectoral portfolio optimization in dynamic markets,” IEEE Access, 2025

  7. [7]

    3s-trader: A multi-llm framework for adaptive stock scoring, strategy, and selection in portfolio optimization,

    K. Chen, H. Ahmad, D. Goel, and C. Szabo, “3s-trader: A multi-llm framework for adaptive stock scoring, strategy, and selection in portfolio optimization,”arXiv preprint arXiv:2510.17393, 2025

  8. [8]

    Australian bushfire intelligence with ai-driven environmental analytics,

    T. Jois, H. Ahmad, F. Noor, and F. Ullah, “Australian bushfire intelligence with ai-driven environmental analytics,”arXiv preprint arXiv:2601.06105, 2026

  9. [9]

    A survey of security challenges in cloud-based SCADA systems,

    A. Wali and F. Alshehry, “A survey of security challenges in cloud-based SCADA systems,”Computers, vol. 13, no. 4, p. 97, 2024

  10. [10]

    A survey on security issues in smart grids,

    P. Jokar, N. Arianpoo, and V . C. M. Leung, “A survey on security issues in smart grids,”Security and Communication Networks, vol. 9, no. 3, pp. 262–273, 2016

  11. [11]

    A review on c3i systems’ security: Vulnerabilities, attacks, and countermeasures,

    H. Ahmad, I. Dharmadasa, F. Ullah, and M. A. Babar, “A review on c3i systems’ security: Vulnerabilities, attacks, and countermeasures,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–38, 2023

  12. [12]

    Microservice vulnerability analysis: A literature review with empirical insights,

    R. K. Jayalath, H. Ahmad, D. Goel, M. S. Syed, and F. Ullah, “Microservice vulnerability analysis: A literature review with empirical insights,”IEEE Access, vol. 12, pp. 155 168–155 204, 2024

  13. [13]

    Living off the land and fileless attack techniques,

    C. Wueest and H. Anand, “Living off the land and fileless attack techniques,” Symantec, Mountain View, CA, USA, Tech. Rep., 2017

  14. [14]

    A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,

    A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,”IEEE Communications Surveys & Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019

  15. [15]

    Chatnvd: Advancing cybersecurity vulnerability assessment with large language models,

    S. Chopra, H. Ahmad, D. Goel, and C. Szabo, “Chatnvd: Advancing cybersecurity vulnerability assessment with large language models,” IEEE Access, 2026

  16. [16]

    Towards deep learning enabled cybersecurity risk assessment for microservice archi- tectures,

    M. Abdulsatar, H. Ahmad, D. Goel, and F. Ullah, “Towards deep learning enabled cybersecurity risk assessment for microservice archi- tectures,”Cluster Computing, vol. 28, no. 6, p. 350, 2025

  17. [17]

    Deep reinforcement learning for cyber security,

    T. T. Nguyen and V . J. Reddi, “Deep reinforcement learning for cyber security,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 10, pp. 4535–4549, 2021

  18. [18]

    Enhancing security and energy efficiency of cyber-physical systems using deep reinforcement learning,

    S. Jamshidi, A. Amirnia, A. Nikanjam, and F. Khomh, “Enhancing security and energy efficiency of cyber-physical systems using deep reinforcement learning,”Procedia Computer Science, vol. 238, pp. 1074–1079, 2024

  19. [19]

    Kott, Ed.,Autonomous Intelligent Cyber Defense Agent (AICA), ser

    A. Kott, Ed.,Autonomous Intelligent Cyber Defense Agent (AICA), ser. Advances in Information Security. Springer, 2023

  20. [20]

    Optimizing cyber defense in dynamic active directories through re- inforcement learning,

    D. Goel, K. Moore, M. Guo, D. Wang, M. Kim, and S. Camtepe, “Optimizing cyber defense in dynamic active directories through re- inforcement learning,” inProceedings of the European Symposium on Research in Computer Security (ESORICS). Cham, Switzerland: Springer, 2024, pp. 332–352

  21. [21]

    Forewarned is forearmed: A survey on large language model-based agents in autonomous cyberattacks,

    M. Xu, J. Fan, X. Huang, C. Zhou, J. Kang, D. Niyato, S. Mao, Z. Han, and K.-Y . Lam, “Forewarned is forearmed: A survey on large language model-based agents in autonomous cyberattacks,”arXiv preprint arXiv:2505.12786, 2025

  22. [22]

    A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,

    L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–55, 2025

  23. [23]

    Peeking inside the black-box: A survey on explainable artificial intelligence (xai),

    A. Adadi and M. Berrada, “Peeking inside the black-box: A survey on explainable artificial intelligence (xai),”IEEE Access, vol. 6, pp. 52 138– 52 160, 2018

  24. [24]

    Practical black-box attacks against machine learning,

    N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 2017, pp. 506–519

  25. [25]

    Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,

    M. Macas, C. Wu, and W. Fuertes, “Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,” Expert Systems with Applications, vol. 238, p. 122223, 2023

  26. [26]

    Anomaly de- tection in vehicular networks using causality-aware graph convolutional networks (CA-GCN),

    F. Luo, C. Luo, J. Wang, Z. Li, Z. Liao, and Q. Liu, “Anomaly de- tection in vehicular networks using causality-aware graph convolutional networks (CA-GCN),”International Journal of Automotive Technology, pp. 1–16, 2025

  27. [27]

    Pearl,Causality: Models, Reasoning, and Inference, 2nd ed

    J. Pearl,Causality: Models, Reasoning, and Inference, 2nd ed. Cam- bridge University Press, 2009

  28. [28]

    Robust partial least squares using low rank and sparse decomposition,

    F. Abbas and H. Ahmad, “Robust partial least squares using low rank and sparse decomposition,”arXiv preprint arXiv:2407.06936, 2024

  29. [29]

    A comprehensive review of explainable AI in cybersecurity: Decoding the black box,

    A. Sharma, S. Rani, and M. Shabaz, “A comprehensive review of explainable AI in cybersecurity: Decoding the black box,”ICT Express, 2025

  30. [30]

    Scalar: Self-calibrating adaptive latent attention representation learning,

    F. Abbas, H. Ahmad, and C. Szabo, “Scalar: Self-calibrating adaptive latent attention representation learning,” in2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2025, pp. 762–769

  31. [31]

    AutoGen: Enabling next-gen LLM applications via multi-agent conversation,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inProceedings of the First Conference on Language Modeling, 2024

  32. [32]

    Intelligent multi-agent collaboration model for smart home IoT security,

    L. Rafferty, F. Iqbal, S. Aleem, Z. Lu, S.-C. Huang, and P. C. K. Hung, “Intelligent multi-agent collaboration model for smart home IoT security,” in2018 IEEE International Congress on Internet of Things (ICIOT), 2018, pp. 65–71

  33. [33]

    Co-evolutionary defence of active directory attack graphs via gnn-approximated dynamic pro- gramming,

    D. Goel, H. Ahmad, K. Moore, and M. Guo, “Co-evolutionary defence of active directory attack graphs via gnn-approximated dynamic pro- gramming,”arXiv preprint arXiv:2505.11710, 2025. 25

  34. [34]

    Unveiling the black box: A multi-layer framework for explaining reinforcement learning-based cyber agents,

    D. Goel, K. Moore, J. Wang, M. Kim, and T. T. Nguyen, “Unveiling the black box: A multi-layer framework for explaining reinforcement learning-based cyber agents,”arXiv preprint arXiv:2505.11708, 2025

  35. [35]

    A survey on immersive cyber situational awareness systems,

    H. Ahmad, F. Ullah, and R. Jafri, “A survey on immersive cyber situational awareness systems,”Journal of Cybersecurity and Privacy, vol. 5, no. 2, p. 33, 2025

  36. [36]

    Alpcan and T

    T. Alpcan and T. Ba¸ sar,Network Security: A Decision and Game- Theoretic Approach. Cambridge University Press, 2010

  37. [37]

    Security and privacy for green IoT-based agriculture: Review, blockchain solu- tions, and challenges,

    M. A. Ferrag, L. Shu, X. Yang, L. Derhab, and L. Maglaras, “Security and privacy for green IoT-based agriculture: Review, blockchain solu- tions, and challenges,”IEEE Access, vol. 8, pp. 32 031–32 053, 2020

  38. [38]

    Network intrusion detection: An optimized deep learning approach using big data analytics,

    D. Suja Mary, L. Suganthi, and A. Srisaila, “Network intrusion detection: An optimized deep learning approach using big data analytics,”Expert Systems with Applications, vol. 251, p. 123919, 2024

  39. [39]

    A bidirectional LSTM deep learning approach for intrusion detection,

    Y . Imrana, Y . Xiang, L. Ali, and Z. Abdul-Rauf, “A bidirectional LSTM deep learning approach for intrusion detection,”Expert Systems with Applications, vol. 185, p. 115524, 2021

  40. [40]

    Dugat-LSTM: Deep learning based network intrusion detection system using chaotic optimization strategy,

    R. Devendiran and A. V . Turukmane, “Dugat-LSTM: Deep learning based network intrusion detection system using chaotic optimization strategy,”Expert Systems with Applications, vol. 245, p. 123027, 2024

  41. [41]

    Spirtes, C

    P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search, 2nd ed. MIT Press, 2000

  42. [42]

    Hybrid deep learning model using SPCAGAN augmentation for insider threat analysis,

    R. G. Gayathri, A. Sajjanhar, and Y . Xiang, “Hybrid deep learning model using SPCAGAN augmentation for insider threat analysis,”Expert Systems with Applications, vol. 249, p. 123533, 2024

  43. [43]

    The future of ai: Exploring the potential of large concept models,

    H. Ahmad and D. Goel, “The future of ai: Exploring the potential of large concept models,”arXiv preprint arXiv:2501.05487, 2025

  44. [44]

    What skills do cybersecurity professionals need?

    F. Ullah, X. Ye, U. Fatima, Y . Wu, Z. Akhtar, and H. Ahmad, “What skills do cybersecurity professionals need?”Information & Computer Security, pp. 1–19, 2026

  45. [45]

    Intrusion detection using hybridized meta-heuristic techniques with weighted XGBoost classifier,

    G. Mohiuddin, A. Alenizi, N. Saeed, and S. Alkahtani, “Intrusion detection using hybridized meta-heuristic techniques with weighted XGBoost classifier,”Expert Systems with Applications, vol. 232, p. 120596, 2023

  46. [46]

    A new intrusion detection system based on moth-flame optimizer algorithm,

    M. Alazab, R. M. Khan, S. Goel, K. P. Sahoo, and S. Kumar, “A new intrusion detection system based on moth-flame optimizer algorithm,” Expert Systems with Applications, vol. 210, p. 118439, 2022

  47. [47]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016

  48. [48]

    Machine learning driven smishing detection framework for mobile security,

    D. Goel, H. Ahmad, A. K. Jain, and N. K. Goel, “Machine learning driven smishing detection framework for mobile security,”arXiv preprint arXiv:2412.09641, 2024

  49. [49]

    Review of artificial intel- ligence for enhancing intrusion detection in the internet of things,

    M. Saied, S. Guirguis, and M. Madbouly, “Review of artificial intel- ligence for enhancing intrusion detection in the internet of things,” Engineering Applications of Artificial Intelligence, vol. 127, p. 107231, 2024

  50. [50]

    An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability,

    M. A. Bouke and A. Abdullah, “An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability,”Expert Systems with Applications, vol. 230, p. 120715, 2023

  51. [51]

    An improved random forest based on the classification accuracy and correlation measurement of decision trees,

    Z. Sun, G. Wang, P. Li, H. Wang, M. Zhang, and X. Liang, “An improved random forest based on the classification accuracy and correlation measurement of decision trees,”Expert Systems with Applications, vol. 237, p. 121549, 2024

  52. [52]

    Malware detection issues, challenges, and future directions: A survey,

    F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-Rimy, T. A. E. Eisa, and A. A. H. Elnour, “Malware detection issues, challenges, and future directions: A survey,”Applied Sciences, vol. 12, no. 17, p. 8482, 2022

  53. [53]

    Verified models and reference implementations for the TLS 1.3 standard candidate,

    K. Bhargavan, B. Blanchet, and N. Kobeissi, “Verified models and reference implementations for the TLS 1.3 standard candidate,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 483–502

  54. [54]

    Explanation in artificial intelligence: Insights from the social sciences,

    A. Miller, “Explanation in artificial intelligence: Insights from the social sciences,”Artificial Intelligence, vol. 267, pp. 1–38, 2019

  55. [55]

    Counterfactual explanations and algorithmic recourses for machine learning: A review,

    S. Verma, V . Boonsanong, M. Hoang, K. Hines, J. Dickerson, and C. Shah, “Counterfactual explanations and algorithmic recourses for machine learning: A review,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–42, 2024

  56. [56]

    Human-in-the-loop machine learn- ing: A state of the art,

    E. Mosqueira-Rey, E. Hernandez-Pereira, D. Alonso-Rios, J. Bobes- Bascaran, and A. Fernandez-Leal, “Human-in-the-loop machine learn- ing: A state of the art,”Artificial Intelligence Review, vol. 56, no. 4, pp. 3005–3054, 2023

  57. [57]

    Shoham and K

    Y . Shoham and K. Leyton-Brown,Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008

  58. [58]

    A survey and critique of multiagent deep reinforcement learning,

    P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,”Autonomous Agents and Multi- Agent Systems, vol. 33, no. 6, pp. 750–797, 2019

  59. [59]

    M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, 2014

  60. [60]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018

  61. [61]

    E. M. Clarke, O. Grumberg, and D. A. Peled,Model Checking, 2nd ed. MIT Press, 1999

  62. [62]

    CICIoT2023: A real-time dataset and benchmark for large- scale attacks in IoT environment,

    E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, and A. A. Ghorbani, “CICIoT2023: A real-time dataset and benchmark for large- scale attacks in IoT environment,”Sensors, vol. 23, no. 13, p. 5941, 2023