pith. sign in

arxiv: 1907.01396 · v1 · pith:B2RT3GU6new · submitted 2019-07-01 · 💻 cs.CR

Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense

Pith reviewed 2026-05-25 12:20 UTC · model grok-4.3

classification 💻 cs.CR
keywords strategic learningcyber defensedefensive deceptionmoving target defensehoneypot engagementuncertainty reductionpolicy convergenceautonomous adaptation
0
0 comments X

The pith

Strategic learning enables three cyber defense schemes to converge to optimal policies under progressive uncertainty.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes three defense schemes that actively engage attackers to raise their costs and collect information, forming an active, autonomous, and adaptive cyber defense approach. These schemes—defensive deception, feedback-driven moving target defense, and adaptive honeypot engagement—operate under three increasing levels of uncertainty about parameters, payoffs, and the full environment. To address the unknowns, the paper applies three tailored strategic learning schemes that all rely on the same feedback loop of sensation, estimation, and actions. This loop reinforces policies that yield better outcomes until they reach optimal performance in an autonomous way. The work also aims to quantify the resulting security gains against the costs of running the defenses.

Core claim

The paper claims that the three defense schemes, facing parameter uncertainty, payoff uncertainty, and environmental uncertainty in turn, each use a distinct strategic learning scheme, yet all three schemes share an identical feedback structure of sensation, estimation, and actions. Through this structure the most rewarding policies are reinforced repeatedly, allowing them to converge autonomously and adaptively to the optimal policies even when the defender lacks complete knowledge of the attacker.

What carries the argument

The shared feedback structure of sensation, estimation, and actions that reinforces rewarding policies until convergence in each of the three learning schemes.

If this is right

  • Defensive deception can detect and counter attacks while handling parameter uncertainty.
  • Moving target defense can adapt dynamically under payoff uncertainty through feedback.
  • Honeypot engagement can manage environmental uncertainty by learning from interactions.
  • All schemes increase attacker costs while gathering threat information without full prior knowledge.
  • The security benefits of the defenses can be weighed against their implementation costs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The common feedback structure may allow the same learning pattern to be reused in other security settings that involve partial information about opponents.
  • Convergence under uncertainty could support defenses that update in real time as attacker tactics evolve.
  • The quantified security-cost tradeoff might guide decisions on when to deploy active measures versus static ones.

Load-bearing premise

The learning schemes can estimate the unknowns and reduce uncertainty enough for the policies to converge to the optimal ones.

What would settle it

A simulation or deployment test in which policies under one of the three schemes fail to converge when uncertainty stays high despite repeated application of the sensation-estimation-action feedback.

Figures

Figures reproduced from arXiv: 1907.01396 by Linan Huang, Quanyan Zhu.

Figure 3
Figure 3. Figure 3: The multistage structure of APT kill chain is composed of reconnaissance, [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: a has an attack surface π1(c1,1) = {v1,1, v1,2} while configuration c1,2 in Fig. 5b reveals two vulnerabilities v1,2, v1,3 ∈ π1(c1,2). Then, if the attacker takes action a1,1 and the defender changes the configuration from c1,1 to c1,2, the attack is deterred at the first layer. v1,1 v1,2 v1,3 v2,1 v2,2 v2,3 v3,1 v3,2 v3,3 v4,1 v4,2 v4,3 c1,1 c2,1 c3,1 c4,2 (a) Attack surface π1(c1,1) = {v1,1, v1,2 }. v1,1… view at source ↗
Figure 8
Figure 8. Figure 8: The honeynet in red emulates and shares the same structure as the targeted [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 11
Figure 11. Figure 11: Convergence results of Q-learning over SMDP. 5 Conclusion and Discussion This chapter has introduced three defense schemes, i.e., defensive deception to detect and counter adversarial deception, feedback-driven Moving Target Defense (MTD) to increase the attacker’s probing and reconnaissance costs, and adaptive honeypot engagement to gather fundamental threat information. These schemes satisfy the Princip… view at source ↗
read the original abstract

The increasing instances of advanced attacks call for a new defense paradigm that is active, autonomous, and adaptive, named as the \texttt{`3A'} defense paradigm. This chapter introduces three defense schemes that actively interact with attackers to increase the attack cost and gather threat information, i.e., defensive deception for detection and counter-deception, feedback-driven Moving Target Defense (MTD), and adaptive honeypot engagement. Due to the cyber deception, external noise, and the absent knowledge of the other players' behaviors and goals, these schemes possess three progressive levels of information restrictions, i.e., from the parameter uncertainty, the payoff uncertainty, to the environmental uncertainty. To estimate the unknown and reduce uncertainty, we adopt three different strategic learning schemes that fit the associated information restrictions. All three learning schemes share the same feedback structure of sensation, estimation, and actions so that the most rewarding policies get reinforced and converge to the optimal ones in autonomous and adaptive fashions. This work aims to shed lights on proactive defense strategies, lay a solid foundation for strategic learning under incomplete information, and quantify the tradeoff between the security and costs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper introduces a '3A' (active, adaptive, autonomous) cyber defense paradigm and presents three schemes—defensive deception for detection/counter-deception, feedback-driven Moving Target Defense (MTD), and adaptive honeypot engagement—under progressive uncertainty levels (parameter, payoff, environmental). It adopts three strategic learning schemes that share a sensation-estimation-action feedback structure to reinforce rewarding policies and achieve convergence to optimal policies in an autonomous, adaptive manner, with the goal of increasing attack costs, gathering threat information, and quantifying security-cost tradeoffs.

Significance. If the convergence claims under incomplete information are rigorously established, the work would provide a conceptual foundation for proactive defense strategies and strategic learning in cyber settings with uncertainty, potentially influencing research on adaptive MTD and deception-based defenses.

major comments (1)
  1. [Abstract] Abstract: The central claim that the three learning schemes 'converge to the optimal ones' via the shared feedback structure is asserted without any derivations, equations, convergence proofs, simulation results, or validation steps; this is load-bearing for the paper's contribution on strategic learning under uncertainty.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and the opportunity to clarify the manuscript. The concern regarding the abstract's assertion of convergence is addressed point-by-point below. The full paper provides the supporting analysis as summarized in the abstract.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the three learning schemes 'converge to the optimal ones' via the shared feedback structure is asserted without any derivations, equations, convergence proofs, simulation results, or validation steps; this is load-bearing for the paper's contribution on strategic learning under uncertainty.

    Authors: The abstract is a concise summary of the paper's contributions and does not contain the full technical details, which is standard. The three strategic learning schemes are developed in detail in Sections 3 (defensive deception under parameter uncertainty), 4 (feedback-driven MTD under payoff uncertainty), and 5 (adaptive honeypot engagement under environmental uncertainty). Each section derives the sensation-estimation-action feedback loop, presents the specific learning algorithm (e.g., Q-learning variants or regret-matching), provides convergence proofs to Nash equilibria or optimal policies under the respective uncertainty model using stochastic approximation or Lyapunov analysis, and includes equations for value estimation and policy updates. Section 6 presents simulation results across multiple scenarios validating convergence rates, attack cost increases, and security-cost tradeoffs. If the referee prefers, we can revise the abstract to explicitly reference these sections. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper describes three defense schemes under progressive uncertainty levels and adopts corresponding strategic learning schemes that share a sensation-estimation-action feedback loop, with the claim that rewarding policies are reinforced to converge to optima. This is a high-level assertion aligned with standard reinforcement learning under incomplete information. No equations, fitted parameters presented as predictions, or load-bearing self-citations are exhibited in the text that would reduce the central claim to its inputs by construction. The derivation chain remains self-contained against external benchmarks such as RL convergence results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that strategic learning under the three described uncertainty regimes will produce convergence to optimal policies via the shared sensation-estimation-action loop; no free parameters or invented entities are named in the abstract.

axioms (1)
  • domain assumption Strategic learning schemes can estimate unknown parameters, payoffs, or environments in cyber defense interactions and thereby reduce uncertainty sufficiently for policy convergence.
    Invoked in the abstract when stating that the schemes 'estimate the unknown and reduce uncertainty' and 'converge to the optimal ones'.

pith-pipeline@v0.9.0 · 5718 in / 1253 out tokens · 24285 ms · 2026-05-25T12:20:10.389865+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 2 internal anchors

  1. [1]

    Verizon 2019 data breach investigations report,

    “Verizon 2019 data breach investigations report,” 2019

  2. [2]

    Combatting cyber risks in the supply chain,

    D. Shackleford, “Combatting cyber risks in the supply chain,”SANS. org, 2015

  3. [3]

    Adaptive strategic cyber defense for advanced persistent threats in criticalinfrastructurenetworks,

    L. Huang and Q. Zhu, “Adaptive strategic cyber defense for advanced persistent threats in criticalinfrastructurenetworks,” ACMSIGMETRICSPerformanceEvaluationReview ,vol.46, no. 2, pp. 52–56, 2019

  4. [4]

    Analysis and computation of adaptive defense strategies against advanced persistent threatsforcyber-physicalsystems,

    ——, “Analysis and computation of adaptive defense strategies against advanced persistent threatsforcyber-physicalsystems,”in InternationalConferenceonDecisionandGameTheory for Security. Springer, 2018, pp. 205–226

  5. [5]

    A Dynamic Games Approach to Proactive Defense Strategies against AdvancedPersistentThreatsinCyber-PhysicalSystems,

    L. Huang and Q. Zhu, “A Dynamic Games Approach to Proactive Defense Strategies against AdvancedPersistentThreatsinCyber-PhysicalSystems,” arXive-prints,p.arXiv:1906.09687, Jun 2019

  6. [6]

    Game-theoretic approach to feedback-driven multi-stage moving target defense,

    Q. Zhu and T. Başar, “Game-theoretic approach to feedback-driven multi-stage moving target defense,” inInternational Conference on Decision and Game Theory for Security. Springer, 2013, pp. 246–263. 22 Linan Huang and Quanyan Zhu

  7. [7]

    Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes,

    L. Huang and Q. Zhu, “Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes,”arXiv e-prints, p. arXiv:1906.12182, Jun 2019

  8. [8]

    A Game-Theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy

    J. Pawlick, E. Colbert, and Q. Zhu, “A game-theoretic taxonomy and survey of defensive deception for cybersecurity and privacy,”arXiv preprint arXiv:1712.05441, 2017

  9. [9]

    Integrating cyber-d&d into adversary modeling for active cyber defense,

    F. J. Stech, K. E. Heckman, and B. E. Strom, “Integrating cyber-d&d into adversary modeling for active cyber defense,” inCyber deception. Springer, 2016, pp. 1–22

  10. [10]

    Activecyberdefensewithdenialanddeception:Acyber-wargameexperiment,

    K. E. Heckman, M. J. Walsh, F. J. Stech, T. A. O’boyle, S. R. DiCato, and A. F. Herber, “Activecyberdefensewithdenialanddeception:Acyber-wargameexperiment,” computers& security, vol. 37, pp. 72–77, 2013

  11. [11]

    R-locker:Thwartingran- somware action through a honeyfile-based approach,

    J.Gómez-Hernández,L.Álvarez-González,andP.García-Teodoro,“R-locker:Thwartingran- somware action through a honeyfile-based approach,”Computers & Security, vol. 73, pp. 389–398, 2018

  12. [12]

    Changing the game: The art of deceiving sophisticated attackers,

    N. Virvilis, B. Vanautgaerden, and O. S. Serrano, “Changing the game: The art of deceiving sophisticated attackers,” in2014 6th International Conference On Cyber Conflict (CyCon 2014). IEEE, 2014, pp. 87–97

  13. [13]

    Modelingandanalysisofleakydeceptionusingsignaling games with evidence,

    J.Pawlick,E.Colbert,andQ.Zhu,“Modelingandanalysisofleakydeceptionusingsignaling games with evidence,”IEEE Transactions on Information Forensics and Security, 2018

  14. [14]

    SpringerScience&BusinessMedia,2011,vol.54

    S.Jajodia,A.K.Ghosh,V.Swarup,C.Wang,andX.S.Wang, Movingtargetdefense:creating asymmetricuncertaintyforcyberthreats . SpringerScience&BusinessMedia,2011,vol.54

  15. [15]

    Countering code-injection attacks with instruction-set randomization,

    G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-injection attacks with instruction-set randomization,” inProceedings of the 10th ACM conference on Computer and communications security. ACM, 2003, pp. 272–280

  16. [16]

    Deceptive routing in relay networks,

    A. Clark, Q. Zhu, R. Poovendran, and T. Başar, “Deceptive routing in relay networks,” in International Conference on Decision and Game Theory for Security. Springer, 2012, pp. 171–185

  17. [17]

    Markov modeling of moving target defense games,

    H. Maleki, S. Valizadeh, W. Koch, A. Bestavros, and M. van Dijk, “Markov modeling of moving target defense games,” inProceedings of the 2016 ACM Workshop on Moving Target Defense. ACM, 2016, pp. 81–92

  18. [18]

    A methodology for intelligent honeypot deployment and active engagement of attackers,

    C. R. Hecker, “A methodology for intelligent honeypot deployment and active engagement of attackers,” Ph.D. dissertation, 2012

  19. [19]

    Deceptive attack and defense game in honeypot-enablednetworksfortheinternetofthings,

    Q. D. La, T. Q. Quek, J. Lee, S. Jin, and H. Zhu, “Deceptive attack and defense game in honeypot-enablednetworksfortheinternetofthings,” IEEEInternetofThingsJournal ,vol.3, no. 6, pp. 1025–1035, 2016

  20. [20]

    Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats

    J. Pawlick, T. T. H. Nguyen, and Q. Zhu, “Optimal timing in dynamic and robust attacker engagement during advanced persistent threats,”CoRR, vol. abs/1707.08031, 2017. [Online]. Available: http://arxiv.org/abs/1707.08031

  21. [21]

    A Stackelberg game perspective on the conflict between machine learning and data obfuscation,

    J. Pawlick and Q. Zhu, “A Stackelberg game perspective on the conflict between machine learning and data obfuscation,” in Information Forensics and Security (WIFS), 2016 IEEE International Workshop on . IEEE, 2016, pp. 1–6. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7823893/

  22. [22]

    Deployment and exploitation of deceptive honeybots in social networks,

    Q. Zhu, A. Clark, R. Poovendran, and T. Basar, “Deployment and exploitation of deceptive honeybots in social networks,” inDecision and Control (CDC), 2013 IEEE 52nd Annual Conference on. IEEE, 2013, pp. 212–219

  23. [23]

    Hybrid learning in stochastic games and its applications in network security,

    Q. Zhu, H. Tembine, and T. Basar, “Hybrid learning in stochastic games and its applications in network security,”Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, pp. 305–329, 2013

  24. [24]

    Interference aware routing game for cognitive radio multi-hop networks,

    Q. Zhu, Z. Yuan, J. B. Song, Z. Han, and T. Başar, “Interference aware routing game for cognitive radio multi-hop networks,”Selected Areas in Communications, IEEE Journal on, vol. 30, no. 10, pp. 2006–2015, 2012

  25. [25]

    Game-theoreticanalysisofnodecaptureandcloningattack with multiple attackers in wireless sensor networks,

    Q.Zhu,L.Bushnell,andT.Basar,“Game-theoreticanalysisofnodecaptureandcloningattack with multiple attackers in wireless sensor networks,” inDecision and Control (CDC), 2012 IEEE 51st Annual Conference on. IEEE, 2012, pp. 3404–3411

  26. [26]

    Deceptive routing games,

    Q. Zhu, A. Clark, R. Poovendran, and T. Başar, “Deceptive routing games,” inDecision and Control (CDC), 2012 IEEE 51st Annual Conference on. IEEE, 2012, pp. 2704–2711. Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense 23

  27. [27]

    A stochastic game model for jamming in multi-channel cognitive radio systems

    Q. Zhu, H. Li, Z. Han, and T. Basar, “A stochastic game model for jamming in multi-channel cognitive radio systems.” inICC, 2010, pp. 1–6

  28. [28]

    Secure and practical output feedback control for cloud-enabled cyber- physical systems,

    Z. Xu and Q. Zhu, “Secure and practical output feedback control for cloud-enabled cyber- physical systems,” inCommunications and Network Security (CNS), 2017 IEEE Conference on. IEEE, 2017, pp. 416–420

  29. [29]

    A Game-Theoretic Approach to Secure Control of Communication-Based Train Control Systems Under Jamming Attacks,

    ——, “A Game-Theoretic Approach to Secure Control of Communication-Based Train Control Systems Under Jamming Attacks,” inProceedings of the 1st International Workshop on Safe Control of Connected and Autonomous Vehicles. ACM, 2017, pp. 27–34. [Online]. Available: http://dl.acm.org/citation.cfm?id=3055381

  30. [30]

    Cross-layer secure cyber-physical control system design for networked 3d printers,

    ——, “Cross-layer secure cyber-physical control system design for networked 3d printers,” in American Control Conference (ACC), 2016. IEEE, 2016, pp. 1191–1196. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7525079/

  31. [31]

    Modeling, analysis, and mitigation of dynamic botnet formation in wireless iot networks,

    M. J. Farooq and Q. Zhu, “Modeling, analysis, and mitigation of dynamic botnet formation in wireless iot networks,”IEEE Transactions on Information Forensics and Security, 2019

  32. [32]

    A cyber-physical game framework for secure and resilient multi-agent autonomous systems,

    Z. Xu and Q. Zhu, “A cyber-physical game framework for secure and resilient multi-agent autonomous systems,” inDecision and Control (CDC), 2015 IEEE 54th Annual Conference on. IEEE, 2015, pp. 5156–5161

  33. [33]

    Alarge-scalemarkovgameapproachtodynamicprotectionof interdependent infrastructure networks,

    L.Huang,J.Chen,andQ.Zhu,“Alarge-scalemarkovgameapproachtodynamicprotectionof interdependent infrastructure networks,” inInternational Conference on Decision and Game Theory for Security. Springer, 2017, pp. 357–376

  34. [34]

    Adynamicgameanalysisanddesignofinfrastructurenetwork protection and recovery,

    J.Chen,C.Touati,andQ.Zhu,“Adynamicgameanalysisanddesignofinfrastructurenetwork protection and recovery,”ACM SIGMETRICS Performance Evaluation Review, vol. 45, no. 2, p. 128, 2017

  35. [35]

    A hybrid stochastic game for secure control of cyber-physical systems,

    F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas, “A hybrid stochastic game for secure control of cyber-physical systems,”Automatica, vol. 93, pp. 55–63, 2018

  36. [36]

    Resilient control of cyber-physical systems againstdenial-of-serviceattacks,

    Y. Yuan, Q. Zhu, F. Sun, Q. Wang, and T. Basar, “Resilient control of cyber-physical systems againstdenial-of-serviceattacks,”in ResilientControlSystems(ISRCS),20136thInternational Symposium on. IEEE, 2013, pp. 54–59

  37. [37]

    GADAPT: A Sequential Game-Theoretic Framework for Designing Defense-in-Depth Strategies Against Advanced Persistent Threats,

    S. Rass and Q. Zhu, “GADAPT: A Sequential Game-Theoretic Framework for Designing Defense-in-Depth Strategies Against Advanced Persistent Threats,” inDecision and Game TheoryforSecurity ,ser.LectureNotesinComputerScience,Q.Zhu,T.Alpcan,E.Panaousis, M. Tambe, and W. Casey, Eds. Cham: Springer International Publishing, 2016, vol. 9996, pp. 314–326

  38. [38]

    Dynamicinterferenceminimizationrouting game for on-demand cognitive pilot channel,

    Q.Zhu,Z.Yuan,J.B.Song,Z.Han,andT.Basar,“Dynamicinterferenceminimizationrouting game for on-demand cognitive pilot channel,” inGlobal Telecommunications Conference (GLOBECOM 2010), 2010 IEEE. IEEE, 2010, pp. 1–6

  39. [39]

    Strategic defense against deceptive civilian gps spoofing of unmanned aerial vehicles,

    T. Zhang and Q. Zhu, “Strategic defense against deceptive civilian gps spoofing of unmanned aerial vehicles,” inInternational Conference on Decision and Game Theory for Security. Springer, 2017, pp. 213–233

  40. [40]

    Analysis and computation of adaptive defense strategies against ad- vancedpersistentthreatsforcyber-physicalsystems,

    L. Huang and Q. Zhu, “Analysis and computation of adaptive defense strategies against ad- vancedpersistentthreatsforcyber-physicalsystems,”in InternationalConferenceonDecision and Game Theory for Security, 2018

  41. [41]

    Adaptivestrategiccyberdefenseforadvancedpersistentthreatsincriticalinfrastructure networks,

    ——,“Adaptivestrategiccyberdefenseforadvancedpersistentthreatsincriticalinfrastructure networks,” inACM SIGMETRICS Performance Evaluation Review, 2018

  42. [42]

    Flip the cloud: Cyber-physical signaling games in the presenceofadvancedpersistentthreats,

    J. Pawlick, S. Farhang, and Q. Zhu, “Flip the cloud: Cyber-physical signaling games in the presenceofadvancedpersistentthreats,”in DecisionandGameTheoryforSecurity . Springer, 2015, pp. 289–308

  43. [43]

    A dynamic bayesian security game framework for strategic defense mechanism design,

    S. Farhang, M. H. Manshaei, M. N. Esfahani, and Q. Zhu, “A dynamic bayesian security game framework for strategic defense mechanism design,” inDecision and Game Theory for Security. Springer, 2014, pp. 319–328

  44. [44]

    Dynamicpolicy-basedidsconfiguration,

    Q.ZhuandT.Başar,“Dynamicpolicy-basedidsconfiguration,”in DecisionandControl,2009 held jointly with the 2009 28th Chinese Control Conference. CDC/CCC 2009. Proceedings of the 48th IEEE Conference on. IEEE, 2009, pp. 8600–8605

  45. [45]

    Networksecurityconfigurations:Anonzero-sumstochastic gameapproach,

    Q.Zhu,H.Tembine,andT.Basar,“Networksecurityconfigurations:Anonzero-sumstochastic gameapproach,”in AmericanControlConference(ACC),2010 . IEEE,2010,pp.1059–1064. 24 Linan Huang and Quanyan Zhu

  46. [46]

    Heterogeneouslearninginzero-sumstochasticgameswith incomplete information,

    Q.Zhu,H.Tembine,andT.Başar,“Heterogeneouslearninginzero-sumstochasticgameswith incomplete information,” in49th IEEE conference on decision and control (CDC). IEEE, 2010, pp. 219–224

  47. [47]

    Security as a Service for Cloud-Enabled Internet of Controlled Things under Advanced Persistent Threats: A Contract Design Approach,

    J. Chen and Q. Zhu, “Security as a Service for Cloud-Enabled Internet of Controlled Things under Advanced Persistent Threats: A Contract Design Approach,” IEEE Transactions on Information Forensics and Security , 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7954676/

  48. [48]

    ABi-LevelGameApproachtoAttack-AwareCyberInsurance ofComputerNetworks,

    R.Zhang,Q.Zhu,andY.Hayel,“ABi-LevelGameApproachtoAttack-AwareCyberInsurance ofComputerNetworks,” IEEEJournalonSelectedAreasinCommunications ,vol.35,no.3,pp. 779–794, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7859343/

  49. [49]

    Attack-aware cyber insurance of interdependent computer networks,

    R. Zhang and Q. Zhu, “Attack-aware cyber insurance of interdependent computer networks,” 2016

  50. [50]

    Compliance control: Managed vulnera- bility surface in social-technological systems via signaling games,

    W. A. Casey, Q. Zhu, J. A. Morales, and B. Mishra, “Compliance control: Managed vulnera- bility surface in social-technological systems via signaling games,” inProceedings of the 7th ACM CCS International Workshop on Managing Insider Security Threats. ACM, 2015, pp. 53–62

  51. [51]

    Attack-aware cyber insurance for risk sharing in computer networks,

    Y. Hayel and Q. Zhu, “Attack-aware cyber insurance for risk sharing in computer networks,” in Decision and Game Theory for Security. Springer, 2015, pp. 22–34

  52. [52]

    Epidemic protection over heterogeneous networks using evolutionary poisson games,

    ——, “Epidemic protection over heterogeneous networks using evolutionary poisson games,” IEEETransactionsonInformationForensicsandSecurity ,vol.12,no.8,pp.1786–1800,2017

  53. [53]

    Guidex:Agame-theoreticincentive-basedmech- anismforintrusiondetectionnetworks,

    Q.Zhu,C.Fung,R.Boutaba,andT.Başar,“Guidex:Agame-theoreticincentive-basedmech- anismforintrusiondetectionnetworks,” SelectedAreasinCommunications,IEEEJournalon , vol. 30, no. 11, pp. 2220–2230, 2012

  54. [54]

    Tragedy of anticommons in digital right management of medical records

    Q. Zhu, C. A. Gunter, and T. Basar, “Tragedy of anticommons in digital right management of medical records.” inHealthSec, 2012

  55. [55]

    Agame-theoreticalapproachtoincentivedesignin collaborative intrusion detection networks,

    Q.Zhu,C.Fung,R.Boutaba,andT.Başar,“Agame-theoreticalapproachtoincentivedesignin collaborative intrusion detection networks,” inGame Theory for Networks, 2009. GameNets’

  56. [56]

    IEEE, 2009, pp

    International Conference on. IEEE, 2009, pp. 384–392

  57. [57]

    A game theoretic investigation of deception in network security,

    T. E. Carroll and D. Grosu, “A game theoretic investigation of deception in network security,” Security and Commun. Nets., vol. 4, no. 10, pp. 1162–1172, 2011

  58. [58]

    A Stackelberg game perspective on the conflict between machine learninganddataobfuscation,

    J. Pawlick and Q. Zhu, “A Stackelberg game perspective on the conflict between machine learninganddataobfuscation,” IEEEIntl.WorkshoponInform.ForensicsandSecurity ,2016

  59. [59]

    DynamicdifferentialprivacyforADMM-baseddistributedclassification learning,

    T.ZhangandQ.Zhu,“DynamicdifferentialprivacyforADMM-baseddistributedclassification learning,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 1, pp. 172–187, 2017. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/7563366/

  60. [60]

    Phy-layerlocationprivacy-preservingaccesspointselection mechanism in next-generation wireless networks,

    S.Farhang,Y.Hayel,andQ.Zhu,“Phy-layerlocationprivacy-preservingaccesspointselection mechanism in next-generation wireless networks,” inCommunications and Network Security (CNS), 2015 IEEE Conference on. IEEE, 2015, pp. 263–271

  61. [61]

    Distributedprivacy-preservingcollaborativeintrusiondetectionsystems for vanets,

    T.ZhangandQ.Zhu,“Distributedprivacy-preservingcollaborativeintrusiondetectionsystems for vanets,”IEEE Transactions on Signal and Information Processing over Networks, vol. 4, no. 1, pp. 148–161, 2018

  62. [62]

    gPath: A game-theoretic path selection algorithm to protect tor’s anonymity,

    N. Zhang, W. Yu, X. Fu, and S. K. Das, “gPath: A game-theoretic path selection algorithm to protect tor’s anonymity,” inDecision and Game Theory for Security. Springer, 2010, pp. 58–71

  63. [63]

    Security games with unknown adversarial strategies,

    A. Garnaev, M. Baykal-Gursoy, and H. V. Poor, “Security games with unknown adversarial strategies,”IEEE transactions on cybernetics, vol. 46, no. 10, pp. 2291–2299, 2015

  64. [64]

    Distributed strategic learning with application to network security,

    Q. Zhu, H. Tembine, and T. Başar, “Distributed strategic learning with application to network security,” inProceedings of the 2011 American Control Conference. IEEE, 2011, pp. 4057– 4062

  65. [65]

    Multi-agentreinforcementlearningforintrusiondetection:Acase study and evaluation,

    A.ServinandD.Kudenko,“Multi-agentreinforcementlearningforintrusiondetection:Acase study and evaluation,” inGerman Conference on Multiagent System Technologies. Springer, 2008, pp. 159–170

  66. [66]

    Distributedbayesianlearninginmultiagentsystems:Improvingour understanding of its capabilities and limitations,

    P.M.DjurićandY.Wang,“Distributedbayesianlearninginmultiagentsystems:Improvingour understanding of its capabilities and limitations,”IEEE Signal Processing Magazine, vol. 29, no. 2, pp. 65–76, 2012. Strategic Learning for Active, Adaptive, and Autonomous Cyber Defense 25

  67. [67]

    Coordination in multiagent reinforcement learning: a bayesianapproach,

    G. Chalkiadakis and C. Boutilier, “Coordination in multiagent reinforcement learning: a bayesianapproach,”in ProceedingsofthesecondinternationaljointconferenceonAutonomous agents and multiagent systems. ACM, 2003, pp. 709–716

  68. [68]

    Distributedreinforcementlearningforpowerlimitedmany-core system performance optimization,

    Z.ChenandD.Marculescu,“Distributedreinforcementlearningforpowerlimitedmany-core system performance optimization,” inProceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium, 2015, pp. 1521–1526

  69. [69]

    Games with incomplete information played by “bayesian

    J. C. Harsanyi, “Games with incomplete information played by “bayesian” players, i–iii part i. the basic model,”Management science, vol. 14, no. 3, pp. 159–182, 1967

  70. [70]

    Transfer learning for reinforcement learning domains: A survey,

    M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey,” Journal of Machine Learning Research, vol. 10, no. Jul, pp. 1633–1685, 2009