Recognition: unknown
Adaptive Network Security Policies via Belief Aggregation and Rollout
read the original abstract
Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. The method uses a model or simulator of the system, which is updated when changes occur, and combines three components: belief estimation through particle filtering, offline policy computation through feature-based aggregation, and online policy adaptation through rollout. In particular, feature-based aggregation enables scalable offline optimization of a policy, while rollout adapts the policy online to changes in the system model without repeating the offline optimization. We analyze the approximation error of the aggregation and show that the rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
On-Line Policy Iteration with Trajectory-Driven Policy Generation
An online policy iteration algorithm produces monotonically cost-improving policies for a fixed initial state by training successive policies on trajectory data generated by prior policies.
-
On-Line Policy Iteration with Trajectory-Driven Policy Generation
An online policy iteration algorithm produces a sequence of monotonically cost-improving policies for fixed-initial-state deterministic control by training each new policy on the trajectory generated by the prior one.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.