Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

Mingu Kang; Seungyul Han; Sunwoo Lee; Yonghyeon Jo

arxiv: 2605.18024 · v2 · pith:6E5DKUXPnew · submitted 2026-05-18 · 💻 cs.LG · cs.AI· cs.MA

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

Sunwoo Lee , Mingu Kang , Yonghyeon Jo , Seungyul Han This is my paper

Pith reviewed 2026-05-20 12:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.MA

keywords multi-agent reinforcement learningadversarial robustnesscoordinationinteraction breakinginformation-theoretic attacksMARL

0 comments

The pith

Multi-agent reinforcement learning agents can be trained to keep coordinating when their observations and actions face adversarial perturbations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper focuses on the problem that coordination learned by multiple agents often breaks when external changes disrupt how they share information or act together. It proposes a framework that builds attacks by using information theory to alter what agents see and do, then trains the agents to complete tasks reliably even under those changes. This targets the interaction structure itself rather than only the values or rewards. Results show the trained agents handle a wider range of disruptions better than earlier robust methods and continue to work when some agents disappear entirely.

Core claim

The central claim is that an interaction-breaking adversarial learning framework, built on an information-theoretic view of attacks, can generate perturbations to agents' observations and actions that specifically impede coordination, and that training agents against these perturbations produces policies that remain effective when real disruptions occur.

What carries the argument

The interaction-breaking adversarial learning (IBAL) framework that constructs attacks by perturbing agents' observations and actions to reduce shared information.

If this is right

The approach yields higher robustness than prior robust multi-agent reinforcement learning methods across varied attack types.
Performance remains stronger in settings where some agents are missing.
Robustness extends to corruption of interaction structures, not only to value-based attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same perturbation idea could be tested in single-agent tasks where environment changes mimic loss of useful signals.
Different measures of information, such as mutual information variants, might produce stronger or weaker attacks worth comparing directly.
Deployment in systems like vehicle fleets or robot teams would reveal whether the learned robustness transfers beyond simulated attacks.

Load-bearing premise

Perturbations chosen by information-theoretic measures on observations and actions serve as a good model for the interaction disruptions that actually occur in real multi-agent environments.

What would settle it

Measure whether agents trained under the proposed framework complete cooperative tasks at higher rates than baselines when placed in a physical testbed that introduces real sensor noise or intermittent communication loss.

Figures

Figures reproduced from arXiv: 2605.18024 by Mingu Kang, Seungyul Han, Sunwoo Lee, Yonghyeon Jo.

**Figure 1.** Figure 1: Illustration of the proposed interaction-breaking attack in StarCraft II: (a) normal, (b) observation attack, (c) action attack. how agents influence one another, and therefore may fail to capture attacks that intentionally break inter-agent relationships, under which coordination can collapse abruptly. To study and mitigate this vulnerability, we propose an interaction-breaking attack that explicitly tar… view at source ↗

**Figure 2.** Figure 2: Dimension-wise MI for the G1 agent in the StarCraft II scenario. MI values are normalized to [0, 1]. Here, |G1| = 1 and |G2| = 7, and we set L = 5 × |G2| to match the number of G1 observation dimensions used to observe G2 agents. Action attacker. We design an action attacker that directly minimizes the action-level MI. Given the perturbed observations o˜t, the ego joint policy first samples an intermediat… view at source ↗

**Figure 5.** Figure 5: Average test win rate under various adversarial attacks. 5. Experiments 5.1. Experimental Setup In this section, we compare the robustness of the proposed IBAL and prior robust MARL methods on the StarCraft II Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019), which requires cooperative decision-making among ally units to defeat enemy units. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Average test win rate under various non-parametric perturbations. 5.2. Performance Comparison We report the mean test win rate over 5 seeds under adversarial attacks and non-parametric perturbations in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Trajectory analysis of the interaction-breaking attack and IBAL policies in 8m and MMM tasks [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Component evaluation. (a) 2s3z (b) 8m [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Analysis on the maximum group size K. wise evaluations and analyze sensitivity to the maximum group size K. In Appendix F.3, we provide additional analyses on the masking budget L, the minimum attack probability, and computational complexity. Component Evaluation. To quantify each component’s contribution, we evaluate ablations on 8m under Dis-1. We compare IBAL (Ours) with four variants: IBAL w/o adapti… view at source ↗

**Figure 10.** Figure 10: Reconstruction models for MI estimation: (a) observation reconstruction model and (b) action reconstruction model. The proposed joint interaction-breaking attackers f IB adv and π IB adv requires estimating the associated MI terms. In particular, Eq. (1) decomposes the total interaction into the action-level MI term and the observation-level MI term, both of which must be estimated to construct the attack… view at source ↗

**Figure 11.** Figure 11: Visualization of SMAC scenarios. SMAC is a standard testbed for cooperative multi-agent reinforcement learning, designed around decentralized micromanagement in StarCraft II. In SMAC, each unit is controlled by an individual agent that makes decisions from its own partial, local observations without access to global state at execution time. The benchmark provides diverse combat settings, enabling systema… view at source ↗

**Figure 12.** Figure 12: Performance comparison under unseen interaction-breaking attack. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗

**Figure 13.** Figure 13: plots the per-timestep values of the group redundancy, the group-wise MI, and the individual MI. We observe that (a) the group redundancy stays close to zero and its magnitude is substantially smaller than the MI values in (b) and (c), suggesting that it has negligible impact on dimension-wise MI selection. Overall, these results empirically support the validity of decomposing the observation-level MI ter… view at source ↗

**Figure 14.** Figure 14: Trajectory analysis of IBAL under Dis-1 setting on 8m and MMM scenarios. In the main paper, we analyzed how agents respond to the proposed interaction-breaking attack that suppresses cross-group coordination. Here, we further examine IBAL under non-parametric perturbations, focusing on Dis-1 in 8m and MMM. Under Dis-1, one agent becomes disabled and remains stationary, contributing no further actions to c… view at source ↗

**Figure 15.** Figure 15: Ablation study for the masking budget L. Minimum Attack Probability P min act . We ablate the minimum attack probability P min act in our scheduling scheme, where Pact ∼ Unif P min act , P max act is sampled during training. In this ablation, we use K = 1 for 2s3z and K = 4 for 8m; thus, we sweep P min act up to the maximum probability for each environment. As shown in [PITH_FULL_IMAGE:figures/full_fig… view at source ↗

**Figure 16.** Figure 16: Ablation study for the minimum attack probability P min act . 26 [PITH_FULL_IMAGE:figures/full_fig_p026_16.png] view at source ↗

read the original abstract

Cooperation is central to multi-agent reinforcement learning (MARL), yet learned coordination can be fragile when external perturbations disrupt inter-agent interactions. Prior robust MARL methods have primarily considered value-oriented attacks, leaving a gap in robustness when interaction structures themselves are corrupted. In this paper, we propose an interaction-breaking adversarial learning (IBAL) framework that takes an information-theoretic view to construct attacks that impede coordination by perturbing agents' observations and actions, and trains agents to perform reliably under such disruptions. Empirically, our approach improves robustness over existing robust MARL baselines across diverse attack settings and yields stronger performance even under agent-missing scenarios. Our code is available at https://sunwoolee0504.github.io/IBAL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes an Interaction-Breaking Adversarial Learning (IBAL) framework for robust multi-agent reinforcement learning. It adopts an information-theoretic perspective to generate attacks that impede inter-agent coordination by perturbing agents' observations and actions, then trains policies to remain effective under these disruptions. The central empirical claim is that IBAL improves robustness over existing robust MARL baselines across diverse attack settings and yields stronger performance even in agent-missing scenarios.

Significance. If the empirical results hold and the information-theoretic perturbations prove representative of real coordination-breaking disruptions, the work would address a genuine gap in robust MARL by shifting focus from value-oriented attacks to interaction-structure corruption. This could be useful for applications such as multi-robot coordination where learned policies must tolerate partial observability or communication failures.

major comments (1)

[Experiments] Experiments section: the manuscript reports robustness gains over baselines under its own attack family and agent-missing scenarios, yet contains no ablation that replaces the information-theoretic attack generator with an alternative disruption model (e.g., direct reward hacking or dynamics perturbation) while keeping the training procedure otherwise identical. Without this comparison, the reported improvements could be explained by the specific attack distribution rather than by a general interaction-breaking principle, which is load-bearing for the headline claim.

minor comments (1)

[Abstract] Abstract: the claim of improvement 'across diverse attack settings' is stated without enumerating the settings or metrics, making it difficult for readers to gauge the breadth of the evaluation from the outset.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: [Experiments] Experiments section: the manuscript reports robustness gains over baselines under its own attack family and agent-missing scenarios, yet contains no ablation that replaces the information-theoretic attack generator with an alternative disruption model (e.g., direct reward hacking or dynamics perturbation) while keeping the training procedure otherwise identical. Without this comparison, the reported improvements could be explained by the specific attack distribution rather than by a general interaction-breaking principle, which is load-bearing for the headline claim.

Authors: We appreciate the referee's observation. The information-theoretic attack generator is a defining component of the IBAL framework because it explicitly targets reductions in mutual information to disrupt coordination, which is distinct from value-oriented attacks studied in prior work. Our experiments already evaluate robustness under the proposed attack family as well as agent-missing scenarios, the latter of which constitutes an alternative form of interaction disruption. Nevertheless, we agree that an ablation replacing the attack generator with alternatives such as direct reward hacking or dynamics perturbation (while holding the remainder of the training procedure fixed) would help isolate the contribution of the interaction-breaking principle. We will add this comparison in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: framework and robustness claims are empirically grounded without self-referential reductions

full rationale

The paper introduces the IBAL framework as a novel information-theoretic method for constructing adversarial perturbations to observations and actions that impede inter-agent coordination, then demonstrates empirical robustness gains over baselines in multiple attack settings and agent-missing scenarios. No equations, fitted parameters, or self-citations are shown in the abstract or described structure that reduce the claimed improvements to a definition, renaming, or input by construction. The central premise relies on external empirical validation against existing robust MARL methods rather than internal loops or uniqueness theorems imported from prior author work. This qualifies as a self-contained proposal with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5643 in / 1124 out tokens · 49647 ms · 2026-05-20T12:19:20.107478+00:00 · methodology

Interaction-Breaking Adversarial Learning Framework for Robust Multi-Agent Reinforcement Learning

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)