Adaptive Network Security Policies via Belief Aggregation and Rollout

Kim Hammar , Yuchao Li , Tansu Alpcan , Emil C. Lupu , Dimitri Bertsekas

Authors on Pith no claims yet

classification 📡 eess.SY cs.CRcs.SY

keywords changespolicyaggregationpoliciesrolloutsecurityadaptsmethod

read the original abstract

Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. The method uses a model or simulator of the system, which is updated when changes occur, and combines three components: belief estimation through particle filtering, offline policy computation through feature-based aggregation, and online policy adaptation through rollout. In particular, feature-based aggregation enables scalable offline optimization of a policy, while rollout adapts the policy online to changes in the system model without repeating the offline optimization. We analyze the approximation error of the aggregation and show that the rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

On-Line Policy Iteration with Trajectory-Driven Policy Generation
eess.SY 2026-04 unverdicted novelty 6.0

An online policy iteration algorithm produces monotonically cost-improving policies for a fixed initial state by training successive policies on trajectory data generated by prior policies.
On-Line Policy Iteration with Trajectory-Driven Policy Generation
eess.SY 2026-04 unverdicted novelty 6.0

An online policy iteration algorithm produces a sequence of monotonically cost-improving policies for fixed-initial-state deterministic control by training each new policy on the trajectory generated by the prior one.