Recognition: no theorem link
Explainable Autonomous Cyber Defense using Adversarial Multi-Agent Reinforcement Learning
Pith reviewed 2026-05-10 20:12 UTC · model grok-4.3
The pith
C-MADF learns a causal graph from telemetry to constrain an adversarial RL system, cutting false positives in autonomous cyber defense to 1.8% on real IoT data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
C-MADF learns a Structural Causal Model from historical telemetry, compiles it into an investigation-level DAG that restricts admissible transitions, and solves the resulting constrained MDP with a dual-agent RL system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Policy divergence is quantified by a Policy Divergence Score and surfaced through an Explainability-Transparency Score. On the CICIoT2023 dataset the system reports 0.997 precision, 0.961 recall, 0.979 F1-score and a false-positive rate of 1.8 percent, down from 11.2 percent, 9.7 percent and 8.4 percent in three literature baselines.
What carries the argument
The Causal Multi-Agent Decision Framework (C-MADF) that compiles a learned Structural Causal Model into a causally restricted MDP and solves it with counterbalanced Blue- and Red-Team reinforcement-learning policies.
If this is right
- Response actions remain confined to transitions that respect the learned causal structure, limiting the space of possible overreactions.
- Inter-policy disagreement supplies an explicit, numeric signal that can trigger human escalation before an action is taken.
- Performance gains appear on a real-world IoT dataset against three published baselines, indicating measurable reduction in false alarms.
- The explainability score attaches directly to each decision, providing a transparent record of why a given response was chosen or withheld.
Where Pith is reading between the lines
- If the causal graph can be refreshed from recent telemetry without retraining the entire policy, the framework could track slowly drifting threat landscapes.
- The same constrained dual-agent pattern might transfer to other safety-critical control settings where actions must be both effective and causally justifiable.
- Quantitative comparison of Policy Divergence Scores across threat categories could reveal which attack types produce the greatest policy uncertainty.
Load-bearing premise
The structural causal model extracted from past telemetry correctly reflects the true causal structure of live network environments, so that the compiled DAG includes every safe response path and excludes every unsafe one.
What would settle it
A controlled live test in which the system either blocks a legitimate response path or permits an unsafe action that the historical SCM did not anticipate would show that the learned causal graph fails to cover the true environment.
Figures
read the original abstract
Autonomous agents are increasingly deployed in both offensive and defensive cyber operations, creating high-speed, closed-loop interactions in critical infrastructure environments. Advanced Persistent Threat (APT) actors exploit "Living off the Land" techniques and targeted telemetry perturbations to induce ambiguity in monitoring systems, causing automated defenses to overreact or misclassify benign behavior as malicious activity. Existing monolithic and multi-agent defense pipelines largely operate on correlation-based signals, lack structural constraints on response actions, and are vulnerable to reasoning drift under ambiguous or adversarial inputs. We present the Causal Multi-Agent Decision Framework (C-MADF), a structurally constrained architecture for autonomous cyber defense that integrates causal modeling with adversarial dual-policy control. C-MADF first learns a Structural Causal Model (SCM) from historical telemetry and compiles it into an investigation-level Directed Acyclic Graph (DAG) that defines admissible response transitions. This roadmap is formalized as a Markov Decision Process (MDP) whose action space is explicitly restricted to causally consistent transitions. Decision-making within this constrained space is performed by a dual-agent reinforcement learning system in which a threat-optimizing Blue-Team policy is counterbalanced by a conservatively shaped Red-Team policy. Inter-policy disagreement is quantified through a Policy Divergence Score and exposed via a human-in-the-loop interface equipped with an Explainability-Transparency Score that serves as an escalation signal under uncertainty. On the real-world CICIoT2023 dataset, C-MADF reduces the false-positive rate from 11.2%, 9.7%, and 8.4% in three cutting-edge literature baselines to 1.8%, while achieving 0.997 precision, 0.961 recall, and 0.979 F1-score.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the Causal Multi-Agent Decision Framework (C-MADF) for autonomous cyber defense. It learns a Structural Causal Model (SCM) from historical telemetry, compiles it into a DAG to restrict the action space of an MDP, employs adversarial multi-agent RL with blue and red team policies, and uses policy divergence for explainability. On the CICIoT2023 dataset, it reports reducing FPR to 1.8% with high precision, recall, and F1 scores compared to baselines.
Significance. If the reported performance gains are attributable to the causal constraints rather than unconstrained RL or data fitting, this work could provide a valuable approach to incorporating structural causality into RL-based cyber defense systems, enhancing both performance and explainability in high-stakes environments.
major comments (3)
- Abstract: The headline performance claim (FPR reduced to 1.8%, F1=0.979 on CICIoT2023) attributes gains to the SCM-derived DAG constraints on the MDP action space, but the text provides no validation of the learned SCM (e.g., interventional tests or domain-expert review), no ablation removing the DAG restriction, and no confirmation that action restrictions were enforced during RL training. This leaves the central causal contribution unverifiable.
- Abstract: The evaluation trains RL policies on the same CICIoT2023 dataset used for reporting metrics and learns the SCM from historical telemetry; without an explicit train/test split or out-of-distribution test, the numbers risk circularity and cannot be credited to the causal component over standard RL fitting.
- Abstract: No comparison is described against a non-causal multi-agent RL baseline trained and evaluated on the identical data split, which is required to isolate whether the DAG constraints (rather than the dual-policy architecture or other implementation details) drive the reported FPR reduction from 11.2%/9.7%/8.4% to 1.8%.
minor comments (2)
- Abstract: The three literature baselines are not named or cited, preventing direct assessment of the comparison.
- Abstract: The Explainability-Transparency Score and Policy Divergence Score are introduced without definitions or formulas, leaving their computation and use as escalation signals unclear.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which identifies key areas where the causal contributions of C-MADF require stronger empirical support. We address each major comment below and will revise the manuscript to incorporate the suggested clarifications and experiments.
read point-by-point responses
-
Referee: Abstract: The headline performance claim (FPR reduced to 1.8%, F1=0.979 on CICIoT2023) attributes gains to the SCM-derived DAG constraints on the MDP action space, but the text provides no validation of the learned SCM (e.g., interventional tests or domain-expert review), no ablation removing the DAG restriction, and no confirmation that action restrictions were enforced during RL training. This leaves the central causal contribution unverifiable.
Authors: We acknowledge that the manuscript does not presently include interventional validation of the SCM, domain-expert review, an ablation removing the DAG, or explicit confirmation of enforcement. In revision we will add a dedicated subsection on SCM validation (including any interventional checks feasible on the learned model), an ablation study comparing the full C-MADF against an otherwise identical version without DAG restrictions, and a clear description of how the constrained action space was implemented and enforced within the RL training loop. revision: yes
-
Referee: Abstract: The evaluation trains RL policies on the same CICIoT2023 dataset used for reporting metrics and learns the SCM from historical telemetry; without an explicit train/test split or out-of-distribution test, the numbers risk circularity and cannot be credited to the causal component over standard RL fitting.
Authors: The current text does not explicitly document the train/test partitioning. We will revise the evaluation section to state the precise splits used for SCM learning, policy training, and final reporting, and will add discussion of any steps taken to mitigate circularity (e.g., temporal separation of telemetry used for the SCM versus the RL episodes). If out-of-distribution testing is feasible with the available data we will include it; otherwise we will note the limitation. revision: yes
-
Referee: Abstract: No comparison is described against a non-causal multi-agent RL baseline trained and evaluated on the identical data split, which is required to isolate whether the DAG constraints (rather than the dual-policy architecture or other implementation details) drive the reported FPR reduction from 11.2%/9.7%/8.4% to 1.8%.
Authors: We agree that a controlled non-causal multi-agent RL baseline on the same data split is necessary to isolate the contribution of the DAG constraints. The existing literature baselines do not hold the dual-policy architecture fixed. In the revision we will implement and report results for a non-causal counterpart (identical dual-agent RL but with unrestricted action space) trained and evaluated on the identical split, thereby directly quantifying the effect of the causal restrictions. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper describes an empirical architecture: learning an SCM from telemetry, compiling it to a DAG to constrain an MDP, and training dual adversarial RL policies. Performance numbers are presented as measured outcomes on the CICIoT2023 dataset. No equations or steps in the abstract reduce a claimed prediction or first-principles result to the inputs by construction, nor do any self-citations supply load-bearing uniqueness theorems. The process is standard data-driven modeling followed by evaluation; the reported metrics do not constitute a renamed fit or self-referential definition.
Axiom & Free-Parameter Ledger
free parameters (2)
- RL policy parameters
- SCM parameters
axioms (2)
- domain assumption The learned SCM is a faithful representation of the underlying causal mechanisms in the telemetry.
- domain assumption The MDP formulation with restricted transitions preserves all necessary defensive options.
Forward citations
Cited by 2 Pith papers
-
AgenticVM: Agentic AI for Adaptive Software Vulnerability Management
AgenticVM reduces vulnerability scanner alerts by up to 98% and predicts missing CVSS attributes with 89.3% accuracy using a multi-agent LLM framework integrated with security tools and public databases.
-
Comparative Analysis of Large Language Models in Healthcare
Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.
Reference graph
Works this paper leans on
-
[1]
D. Goel, “Enhancing network resilience through machine learning- powered graph combinatorial optimization: Applications in cyber de- fense and information diffusion,”arXiv preprint arXiv:2310.10667, 2023
-
[2]
Overview of smartphone security: Attack and defense techniques,
D. Goel and A. K. Jain, “Overview of smartphone security: Attack and defense techniques,” inComputer and Cyber Security. Boca Raton, FL, USA: Auerbach Publications, 2018, pp. 249–279
2018
-
[3]
Smart hpa: A resource-efficient horizontal pod auto-scaler for microservice archi- tectures,
H. Ahmad, C. Treude, M. Wagner, and C. Szabo, “Smart hpa: A resource-efficient horizontal pod auto-scaler for microservice archi- tectures,” in2024 IEEE 21st International Conference on Software Architecture (ICSA). IEEE, 2024, pp. 46–57
2024
-
[4]
Towards resource-efficient reactive and proactive auto-scaling for microservice architectures,
——, “Towards resource-efficient reactive and proactive auto-scaling for microservice architectures,”Journal of Systems and Software, vol. 225, p. 112390, 2025
2025
-
[5]
Resilient auto-scaling of microservice architectures with efficient resource management,
——, “Resilient auto-scaling of microservice architectures with efficient resource management,”arXiv preprint arXiv:2506.05693, 2025
-
[6]
Regimefolio: A regime aware ml system for sectoral portfolio optimization in dynamic markets,
Y . Zhang, D. Goel, H. Ahmad, and C. Szabo, “Regimefolio: A regime aware ml system for sectoral portfolio optimization in dynamic markets,” IEEE Access, 2025
2025
-
[7]
K. Chen, H. Ahmad, D. Goel, and C. Szabo, “3s-trader: A multi-llm framework for adaptive stock scoring, strategy, and selection in portfolio optimization,”arXiv preprint arXiv:2510.17393, 2025
-
[8]
Australian bushfire intelligence with ai-driven environmental analytics,
T. Jois, H. Ahmad, F. Noor, and F. Ullah, “Australian bushfire intelligence with ai-driven environmental analytics,”arXiv preprint arXiv:2601.06105, 2026
-
[9]
A survey of security challenges in cloud-based SCADA systems,
A. Wali and F. Alshehry, “A survey of security challenges in cloud-based SCADA systems,”Computers, vol. 13, no. 4, p. 97, 2024
2024
-
[10]
A survey on security issues in smart grids,
P. Jokar, N. Arianpoo, and V . C. M. Leung, “A survey on security issues in smart grids,”Security and Communication Networks, vol. 9, no. 3, pp. 262–273, 2016
2016
-
[11]
A review on c3i systems’ security: Vulnerabilities, attacks, and countermeasures,
H. Ahmad, I. Dharmadasa, F. Ullah, and M. A. Babar, “A review on c3i systems’ security: Vulnerabilities, attacks, and countermeasures,”ACM Computing Surveys, vol. 55, no. 9, pp. 1–38, 2023
2023
-
[12]
Microservice vulnerability analysis: A literature review with empirical insights,
R. K. Jayalath, H. Ahmad, D. Goel, M. S. Syed, and F. Ullah, “Microservice vulnerability analysis: A literature review with empirical insights,”IEEE Access, vol. 12, pp. 155 168–155 204, 2024
2024
-
[13]
Living off the land and fileless attack techniques,
C. Wueest and H. Anand, “Living off the land and fileless attack techniques,” Symantec, Mountain View, CA, USA, Tech. Rep., 2017
2017
-
[14]
A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,
A. Alshamrani, S. Myneni, A. Chowdhary, and D. Huang, “A survey on advanced persistent threats: Techniques, solutions, challenges, and research opportunities,”IEEE Communications Surveys & Tutorials, vol. 21, no. 2, pp. 1851–1877, 2019
2019
-
[15]
Chatnvd: Advancing cybersecurity vulnerability assessment with large language models,
S. Chopra, H. Ahmad, D. Goel, and C. Szabo, “Chatnvd: Advancing cybersecurity vulnerability assessment with large language models,” IEEE Access, 2026
2026
-
[16]
Towards deep learning enabled cybersecurity risk assessment for microservice archi- tectures,
M. Abdulsatar, H. Ahmad, D. Goel, and F. Ullah, “Towards deep learning enabled cybersecurity risk assessment for microservice archi- tectures,”Cluster Computing, vol. 28, no. 6, p. 350, 2025
2025
-
[17]
Deep reinforcement learning for cyber security,
T. T. Nguyen and V . J. Reddi, “Deep reinforcement learning for cyber security,”IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 10, pp. 4535–4549, 2021
2021
-
[18]
Enhancing security and energy efficiency of cyber-physical systems using deep reinforcement learning,
S. Jamshidi, A. Amirnia, A. Nikanjam, and F. Khomh, “Enhancing security and energy efficiency of cyber-physical systems using deep reinforcement learning,”Procedia Computer Science, vol. 238, pp. 1074–1079, 2024
2024
-
[19]
Kott, Ed.,Autonomous Intelligent Cyber Defense Agent (AICA), ser
A. Kott, Ed.,Autonomous Intelligent Cyber Defense Agent (AICA), ser. Advances in Information Security. Springer, 2023
2023
-
[20]
Optimizing cyber defense in dynamic active directories through re- inforcement learning,
D. Goel, K. Moore, M. Guo, D. Wang, M. Kim, and S. Camtepe, “Optimizing cyber defense in dynamic active directories through re- inforcement learning,” inProceedings of the European Symposium on Research in Computer Security (ESORICS). Cham, Switzerland: Springer, 2024, pp. 332–352
2024
-
[21]
Forewarned is forearmed: A survey on large language model-based agents in autonomous cyberattacks,
M. Xu, J. Fan, X. Huang, C. Zhou, J. Kang, D. Niyato, S. Mao, Z. Han, and K.-Y . Lam, “Forewarned is forearmed: A survey on large language model-based agents in autonomous cyberattacks,”arXiv preprint arXiv:2505.12786, 2025
-
[22]
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,
L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qinet al., “A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions,” ACM Transactions on Information Systems, vol. 43, no. 2, pp. 1–55, 2025
2025
-
[23]
Peeking inside the black-box: A survey on explainable artificial intelligence (xai),
A. Adadi and M. Berrada, “Peeking inside the black-box: A survey on explainable artificial intelligence (xai),”IEEE Access, vol. 6, pp. 52 138– 52 160, 2018
2018
-
[24]
Practical black-box attacks against machine learning,
N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami, “Practical black-box attacks against machine learning,” in Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, 2017, pp. 506–519
2017
-
[25]
Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,
M. Macas, C. Wu, and W. Fuertes, “Adversarial examples: A survey of attacks and defenses in deep learning-enabled cybersecurity systems,” Expert Systems with Applications, vol. 238, p. 122223, 2023
2023
-
[26]
Anomaly de- tection in vehicular networks using causality-aware graph convolutional networks (CA-GCN),
F. Luo, C. Luo, J. Wang, Z. Li, Z. Liao, and Q. Liu, “Anomaly de- tection in vehicular networks using causality-aware graph convolutional networks (CA-GCN),”International Journal of Automotive Technology, pp. 1–16, 2025
2025
-
[27]
Pearl,Causality: Models, Reasoning, and Inference, 2nd ed
J. Pearl,Causality: Models, Reasoning, and Inference, 2nd ed. Cam- bridge University Press, 2009
2009
-
[28]
Robust partial least squares using low rank and sparse decomposition,
F. Abbas and H. Ahmad, “Robust partial least squares using low rank and sparse decomposition,”arXiv preprint arXiv:2407.06936, 2024
-
[29]
A comprehensive review of explainable AI in cybersecurity: Decoding the black box,
A. Sharma, S. Rani, and M. Shabaz, “A comprehensive review of explainable AI in cybersecurity: Decoding the black box,”ICT Express, 2025
2025
-
[30]
Scalar: Self-calibrating adaptive latent attention representation learning,
F. Abbas, H. Ahmad, and C. Szabo, “Scalar: Self-calibrating adaptive latent attention representation learning,” in2025 IEEE 37th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, 2025, pp. 762–769
2025
-
[31]
AutoGen: Enabling next-gen LLM applications via multi-agent conversation,
Q. Wu, G. Bansal, J. Zhang, Y . Wu, S. Zhang, E. Zhu, B. Li, L. Jiang, X. Zhang, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi-agent conversation,” inProceedings of the First Conference on Language Modeling, 2024
2024
-
[32]
Intelligent multi-agent collaboration model for smart home IoT security,
L. Rafferty, F. Iqbal, S. Aleem, Z. Lu, S.-C. Huang, and P. C. K. Hung, “Intelligent multi-agent collaboration model for smart home IoT security,” in2018 IEEE International Congress on Internet of Things (ICIOT), 2018, pp. 65–71
2018
-
[33]
D. Goel, H. Ahmad, K. Moore, and M. Guo, “Co-evolutionary defence of active directory attack graphs via gnn-approximated dynamic pro- gramming,”arXiv preprint arXiv:2505.11710, 2025. 25
-
[34]
D. Goel, K. Moore, J. Wang, M. Kim, and T. T. Nguyen, “Unveiling the black box: A multi-layer framework for explaining reinforcement learning-based cyber agents,”arXiv preprint arXiv:2505.11708, 2025
-
[35]
A survey on immersive cyber situational awareness systems,
H. Ahmad, F. Ullah, and R. Jafri, “A survey on immersive cyber situational awareness systems,”Journal of Cybersecurity and Privacy, vol. 5, no. 2, p. 33, 2025
2025
-
[36]
Alpcan and T
T. Alpcan and T. Ba¸ sar,Network Security: A Decision and Game- Theoretic Approach. Cambridge University Press, 2010
2010
-
[37]
Security and privacy for green IoT-based agriculture: Review, blockchain solu- tions, and challenges,
M. A. Ferrag, L. Shu, X. Yang, L. Derhab, and L. Maglaras, “Security and privacy for green IoT-based agriculture: Review, blockchain solu- tions, and challenges,”IEEE Access, vol. 8, pp. 32 031–32 053, 2020
2020
-
[38]
Network intrusion detection: An optimized deep learning approach using big data analytics,
D. Suja Mary, L. Suganthi, and A. Srisaila, “Network intrusion detection: An optimized deep learning approach using big data analytics,”Expert Systems with Applications, vol. 251, p. 123919, 2024
2024
-
[39]
A bidirectional LSTM deep learning approach for intrusion detection,
Y . Imrana, Y . Xiang, L. Ali, and Z. Abdul-Rauf, “A bidirectional LSTM deep learning approach for intrusion detection,”Expert Systems with Applications, vol. 185, p. 115524, 2021
2021
-
[40]
Dugat-LSTM: Deep learning based network intrusion detection system using chaotic optimization strategy,
R. Devendiran and A. V . Turukmane, “Dugat-LSTM: Deep learning based network intrusion detection system using chaotic optimization strategy,”Expert Systems with Applications, vol. 245, p. 123027, 2024
2024
-
[41]
Spirtes, C
P. Spirtes, C. N. Glymour, and R. Scheines,Causation, Prediction, and Search, 2nd ed. MIT Press, 2000
2000
-
[42]
Hybrid deep learning model using SPCAGAN augmentation for insider threat analysis,
R. G. Gayathri, A. Sajjanhar, and Y . Xiang, “Hybrid deep learning model using SPCAGAN augmentation for insider threat analysis,”Expert Systems with Applications, vol. 249, p. 123533, 2024
2024
-
[43]
The future of ai: Exploring the potential of large concept models,
H. Ahmad and D. Goel, “The future of ai: Exploring the potential of large concept models,”arXiv preprint arXiv:2501.05487, 2025
-
[44]
What skills do cybersecurity professionals need?
F. Ullah, X. Ye, U. Fatima, Y . Wu, Z. Akhtar, and H. Ahmad, “What skills do cybersecurity professionals need?”Information & Computer Security, pp. 1–19, 2026
2026
-
[45]
Intrusion detection using hybridized meta-heuristic techniques with weighted XGBoost classifier,
G. Mohiuddin, A. Alenizi, N. Saeed, and S. Alkahtani, “Intrusion detection using hybridized meta-heuristic techniques with weighted XGBoost classifier,”Expert Systems with Applications, vol. 232, p. 120596, 2023
2023
-
[46]
A new intrusion detection system based on moth-flame optimizer algorithm,
M. Alazab, R. M. Khan, S. Goel, K. P. Sahoo, and S. Kumar, “A new intrusion detection system based on moth-flame optimizer algorithm,” Expert Systems with Applications, vol. 210, p. 118439, 2022
2022
-
[47]
Goodfellow, Y
I. Goodfellow, Y . Bengio, and A. Courville,Deep Learning. MIT Press, 2016
2016
-
[48]
Machine learning driven smishing detection framework for mobile security,
D. Goel, H. Ahmad, A. K. Jain, and N. K. Goel, “Machine learning driven smishing detection framework for mobile security,”arXiv preprint arXiv:2412.09641, 2024
-
[49]
Review of artificial intel- ligence for enhancing intrusion detection in the internet of things,
M. Saied, S. Guirguis, and M. Madbouly, “Review of artificial intel- ligence for enhancing intrusion detection in the internet of things,” Engineering Applications of Artificial Intelligence, vol. 127, p. 107231, 2024
2024
-
[50]
An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability,
M. A. Bouke and A. Abdullah, “An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability,”Expert Systems with Applications, vol. 230, p. 120715, 2023
2023
-
[51]
An improved random forest based on the classification accuracy and correlation measurement of decision trees,
Z. Sun, G. Wang, P. Li, H. Wang, M. Zhang, and X. Liang, “An improved random forest based on the classification accuracy and correlation measurement of decision trees,”Expert Systems with Applications, vol. 237, p. 121549, 2024
2024
-
[52]
Malware detection issues, challenges, and future directions: A survey,
F. A. Aboaoja, A. Zainal, F. A. Ghaleb, B. A. S. Al-Rimy, T. A. E. Eisa, and A. A. H. Elnour, “Malware detection issues, challenges, and future directions: A survey,”Applied Sciences, vol. 12, no. 17, p. 8482, 2022
2022
-
[53]
Verified models and reference implementations for the TLS 1.3 standard candidate,
K. Bhargavan, B. Blanchet, and N. Kobeissi, “Verified models and reference implementations for the TLS 1.3 standard candidate,” in2017 IEEE Symposium on Security and Privacy (SP), 2017, pp. 483–502
2017
-
[54]
Explanation in artificial intelligence: Insights from the social sciences,
A. Miller, “Explanation in artificial intelligence: Insights from the social sciences,”Artificial Intelligence, vol. 267, pp. 1–38, 2019
2019
-
[55]
Counterfactual explanations and algorithmic recourses for machine learning: A review,
S. Verma, V . Boonsanong, M. Hoang, K. Hines, J. Dickerson, and C. Shah, “Counterfactual explanations and algorithmic recourses for machine learning: A review,”ACM Computing Surveys, vol. 56, no. 12, pp. 1–42, 2024
2024
-
[56]
Human-in-the-loop machine learn- ing: A state of the art,
E. Mosqueira-Rey, E. Hernandez-Pereira, D. Alonso-Rios, J. Bobes- Bascaran, and A. Fernandez-Leal, “Human-in-the-loop machine learn- ing: A state of the art,”Artificial Intelligence Review, vol. 56, no. 4, pp. 3005–3054, 2023
2023
-
[57]
Shoham and K
Y . Shoham and K. Leyton-Brown,Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge University Press, 2008
2008
-
[58]
A survey and critique of multiagent deep reinforcement learning,
P. Hernandez-Leal, B. Kartal, and M. E. Taylor, “A survey and critique of multiagent deep reinforcement learning,”Autonomous Agents and Multi- Agent Systems, vol. 33, no. 6, pp. 750–797, 2019
2019
-
[59]
M. L. Puterman,Markov Decision Processes: Discrete Stochastic Dy- namic Programming. John Wiley & Sons, 2014
2014
-
[60]
R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018
2018
-
[61]
E. M. Clarke, O. Grumberg, and D. A. Peled,Model Checking, 2nd ed. MIT Press, 1999
1999
-
[62]
CICIoT2023: A real-time dataset and benchmark for large- scale attacks in IoT environment,
E. C. P. Neto, S. Dadkhah, R. Ferreira, A. Zohourian, R. Lu, and A. A. Ghorbani, “CICIoT2023: A real-time dataset and benchmark for large- scale attacks in IoT environment,”Sensors, vol. 23, no. 13, p. 5941, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.