arxiv: 2605.08268 · v1 · submitted 2026-05-08 · 💻 cs.MA · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Insider Attacks in Multi-Agent LLM Consensus Systems

Xiaolin Sun , Zixuan Liu , Yibin Hu , Zizhan Zheng

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:45 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords insider attacksmulti-agent systemslarge language modelsconsensus formationreinforcement learningworld modelsadversarial manipulation

0 comments

The pith

A malicious insider in a multi-agent LLM system can learn surrogate dynamics over benign agents' latent states and use reinforcement learning to delay consensus more effectively than static malicious prompts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how one malicious agent participating in a group of language-model agents can manipulate their iterative natural-language exchanges to prevent or delay reaching a shared decision. It formalizes the attack as a sequential decision-making task and proposes learning a compact surrogate model of the benign agents' hidden behavioral patterns, then optimizing the attacker's messages against this model with reinforcement learning. If the approach holds, adaptive model-based attacks pose a greater threat to collaborative LLM systems than simple prompt injections, because they can exploit observed patterns in how the benign agents respond and update. Preliminary experiments show the learned attacker lowers the benign consensus rate and extends disagreement periods beyond what a direct malicious-prompt baseline achieves.

Core claim

A malicious insider learns surrogate dynamics over the latent behavioral states of benign agents and trains an attacker policy via reinforcement learning; this policy reduces the benign consensus rate and prolongs disagreement more effectively than direct malicious prompting.

What carries the argument

The world-model-based attack framework that learns surrogate dynamics over latent behavioral states of benign agents to enable reinforcement learning optimization of the attacker's message choices.

If this is right

The trained attacker reduces the benign consensus rate more effectively than the direct malicious-prompt baseline.
It prolongs disagreement among agents more than the baseline does.
Combining latent world models with reinforcement learning offers a promising direction for adaptive insider attacks in language-based multi-agent systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the surrogate-model approach proves robust, comparable techniques might apply to other multi-agent LLM tasks such as joint planning or negotiation.
Systems relying on LLM consensus may need defenses like message-pattern monitoring or agent-behavior verification to counter learned attacks.
The method could be tested by swapping the underlying LLMs used by benign agents to see whether the attack transfer holds.

Load-bearing premise

The surrogate world model learned over latent behavioral states of benign agents accurately captures the dynamics needed for effective RL-based attack optimization in the real system.

What would settle it

Deploy the RL attacker trained on the surrogate model against real benign LLM agents in a consensus task and check whether it fails to produce lower consensus rates or shorter disagreement times than the direct malicious-prompt baseline.

Figures

Figures reproduced from arXiv: 2605.08268 by Xiaolin Sun, Yibin Hu, Zixuan Liu, Zizhan Zheng.

**Figure 1.** Figure 1: Distribution of episode rounds across attacker settings. D. Extra Results D.1. Distribution of number of rounds [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly deployed in multi-agent systems where agents communicate in natural language to solve tasks jointly. A key capability in such systems is consensus formation, where agents iteratively exchange messages and update decisions to reach a shared outcome. However, most existing multi-agent LLM frameworks assume that all participating agents are aligned with the system objective. In practice, a malicious insider may participate as a legitimate member of the group while pursuing a hidden adversarial goal. In this work, we study insider manipulation in multi-agent LLM consensus systems. We formalize the problem as a sequential decision-making task in which a malicious agent seeks to delay or prevent agreement among benign agents. To make attack optimization tractable, we propose a world-model-based framework that learns surrogate dynamics over the latent behavioral states of benign agents and then trains an attacker using reinforcement learning based on this learned model. Preliminary results show that the trained attacker reduces the benign consensus rate and prolongs disagreement more effectively than the direct malicious-prompt baseline. These results suggest that combining latent world models with reinforcement learning is a promising direction for adaptive insider attacks in language-based multi-agent systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames insider attacks on LLM multi-agent consensus as an RL problem solved via a learned surrogate world model over benign agent states, but the preliminary results leave the transfer from surrogate to real system unverified.

read the letter

The main takeaway is that this work formalizes a malicious insider in multi-agent LLM consensus as a sequential decision process and proposes learning a surrogate dynamics model over latent behavioral states so that RL can optimize the attack policy. Preliminary results indicate the resulting attacker lowers benign consensus rates and extends disagreement periods compared to a direct malicious-prompt baseline. That framing is sensible for turning an otherwise intractable language-based interaction into something optimizable. The combination of world models with RL for this specific security setting appears new relative to prior abstract descriptions of the problem. The paper does a clean job laying out the threat model and why direct optimization fails without the surrogate. The central limitation is the unverified link between the surrogate and the real system. The abstract gives no surrogate validation metrics, no ablation on how well the latent states capture stochastic LLM responses or semantic drift, and no sim-to-real gap measurements. If the surrogate misses higher-order consensus dynamics, the RL policy can exploit model artifacts that do not exist when the attacker is deployed against actual LLMs. This is the load-bearing assumption, and without evidence on it the effectiveness claim stays provisional. The work is aimed at researchers studying robustness and security in collaborative LLM systems. A reader already thinking about multi-agent attack surfaces would get a useful problem statement and a concrete method sketch, even if the current experiments are too light to settle the practical impact. I would send it to peer review. The idea is coherent and draws on established techniques, so referees can push for the missing validation steps rather than desk-rejecting an early but grounded proposal.

Referee Report

2 major / 1 minor

Summary. The paper formalizes insider attacks in multi-agent LLM consensus systems as a sequential decision-making task where a malicious agent aims to delay or prevent agreement among benign agents. It proposes a world-model-based framework that first learns surrogate dynamics over the latent behavioral states of benign agents and then trains an RL attacker policy on this model. Preliminary results are reported showing that the trained attacker reduces the benign consensus rate and prolongs disagreement more effectively than a direct malicious-prompt baseline.

Significance. If the surrogate model is shown to be accurate and the RL policy transfers, the work would be significant for highlighting security vulnerabilities in language-based multi-agent systems and for introducing a tractable RL approach to adaptive insider attacks. It could inform the design of more robust consensus protocols. The preliminary nature of the results, however, makes the current significance tentative.

major comments (2)

[Abstract] Abstract: The effectiveness claim rests on 'preliminary results' showing the RL attacker outperforms the baseline, but no experimental details are provided (e.g., consensus task definition, metrics for consensus rate and disagreement duration, number of trials, statistical tests, or exact baseline implementation). This prevents assessment of whether the data support the central claim.
Framework description (implied in abstract): The approach requires that the learned surrogate dynamics over latent behavioral states accurately capture real LLM interactions for the RL-optimized policy to transfer. No surrogate validation metrics, ablation on model fidelity, or discussion of sim-to-real gaps (e.g., stochastic response generation or semantic drift) are mentioned, which is load-bearing for the reported improvement over the baseline.

minor comments (1)

[Abstract] Abstract: The phrase 'latent behavioral states' is used without any indication of how these states are extracted or represented from natural-language messages.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The effectiveness claim rests on 'preliminary results' showing the RL attacker outperforms the baseline, but no experimental details are provided (e.g., consensus task definition, metrics for consensus rate and disagreement duration, number of trials, statistical tests, or exact baseline implementation). This prevents assessment of whether the data support the central claim.

Authors: We agree that the abstract omits key experimental details, limiting evaluation of the claims. The manuscript presents only preliminary results without these specifics. In revision we will expand the abstract to define the consensus task (iterative natural-language exchanges toward a shared binary decision), specify metrics (consensus rate as fraction of trials reaching agreement within a round limit; disagreement duration as average rounds to agreement or timeout), state the number of trials, note any statistical tests, and describe the baseline as a fixed adversarial prompt. A new experimental section will supply full methodology. revision: yes
Referee: [—] Framework description (implied in abstract): The approach requires that the learned surrogate dynamics over latent behavioral states accurately capture real LLM interactions for the RL-optimized policy to transfer. No surrogate validation metrics, ablation on model fidelity, or discussion of sim-to-real gaps (e.g., stochastic response generation or semantic drift) are mentioned, which is load-bearing for the reported improvement over the baseline.

Authors: We concur that surrogate fidelity is essential for policy transfer and is not addressed in the current manuscript. We will add a subsection on world-model training that reports validation metrics (e.g., prediction error on held-out benign-agent transitions), includes ablations on latent-state dimensionality and model capacity, and discusses sim-to-real gaps such as LLM output stochasticity and semantic drift across extended dialogues. These additions will strengthen support for the observed gains over the baseline. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results provide independent validation against baseline

full rationale

The paper describes a standard world-model + RL pipeline for training an insider attacker and reports preliminary empirical results comparing its performance to a direct malicious-prompt baseline on the real multi-agent LLM system. No derivation step reduces by construction to its own inputs: the surrogate is learned from observed benign trajectories, the RL policy is optimized on that model, and effectiveness is measured via actual consensus rates in the target environment. The central claim is falsifiable via the reported comparison and does not rely on self-citation chains, uniqueness theorems, or renaming of known results. This is the common case of a data-driven method whose validity rests on experimental transfer rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full text would be required to audit these.

pith-pipeline@v0.9.0 · 5498 in / 951 out tokens · 43714 ms · 2026-05-12T00:45:43.178507+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we learn a surrogate world model that predicts how the attacker’s visible neighborhood evolves after an adversarial intervention... P_θ( y^{t+1}_{N_k} | y^t_{N_k}, ψ_{N_k}, a^t_adv )
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the attacker’s objective is to prevent, delay, or degrade consensus among the benign agents... r^t_adv = 1{Δ(y^t_B)>0}

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

241 extracted references · 241 canonical work pages · 8 internal anchors

[1]

Langley , title =

P. Langley , title =. Proceedings of the 17th International Conference on Machine Learning (ICML 2000) , address =. 2000 , pages =

work page 2000
[2]

T. M. Mitchell. The Need for Biases in Learning Generalizations. 1980

work page 1980
[3]

M. J. Kearns , title =

work page
[4]

Machine Learning: An Artificial Intelligence Approach, Vol. I. 1983

work page 1983
[5]

R. O. Duda and P. E. Hart and D. G. Stork. Pattern Classification. 2000

work page 2000
[6]

Suppressed for Anonymity , author=

work page
[7]

Newell and P

A. Newell and P. S. Rosenbloom. Mechanisms of Skill Acquisition and the Law of Practice. Cognitive Skills and Their Acquisition. 1981

work page 1981
[8]

A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. 1959

work page 1959
[9]

2018 , edition =

Reinforcement Learning: An Introduction , author =. 2018 , edition =

work page 2018
[10]

Nature , volume =

Human-level control through deep reinforcement learning , author =. Nature , volume =

work page
[11]

Nature , volume =

Mastering the game of Go with deep neural networks and tree search , author =. Nature , volume =

work page
[12]

Journal of Machine Learning Research , volume =

End-to-end training of deep visuomotor policies , author =. Journal of Machine Learning Research , volume =

work page
[13]

2016 , publisher =

Handbook of Computational Social Choice , author =. 2016 , publisher =

work page 2016
[14]

2011 , publisher =

Security and Game Theory: Algorithms, Deployed Systems, Lessons Learned , author =. 2011 , publisher =

work page 2011
[15]

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) , year =

Robust Deep Reinforcement Learning with Adversarial Attacks , author =. Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS) , year =

work page
[16]

International Conference on Learning Representations (ICLR) , year =

Intriguing properties of neural networks , author =. International Conference on Learning Representations (ICLR) , year =

work page
[17]

International Conference on Learning Representations (ICLR) , year =

Explaining and Harnessing Adversarial Examples , author =. International Conference on Learning Representations (ICLR) , year =

work page
[18]

ICLR Workshop on Security and Privacy in Machine Learning , year =

Adversarial Attacks on Neural Network Policies , author =. ICLR Workshop on Security and Privacy in Machine Learning , year =

work page
[19]

International Conference on Learning Representations (ICLR) , year=

Who is the strongest enemy? towards optimal and efficient evasion attacks in deep rl , author=. International Conference on Learning Representations (ICLR) , year=

work page
[20]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Robust deep reinforcement learning against adversarial perturbations on state observations , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[21]

International Conference on Learning Representations (ICLR) , year =

Continuous control with deep reinforcement learning , author =. International Conference on Learning Representations (ICLR) , year =

work page
[22]

Proceedings of the 35th International Conference on Machine Learning (ICML) , year =

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , author =. Proceedings of the 35th International Conference on Machine Learning (ICML) , year =

work page
[23]

Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms , author =. arXiv preprint arXiv:1707.06347 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[24]

International Conference on Learning Representations (ICLR) , year =

ReAct: Synergizing Reasoning and Acting in Language Models , author =. International Conference on Learning Representations (ICLR) , year =

work page
[25]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Toolformer: Language Models Can Teach Themselves to Use Tools , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[26]

2025 , howpublished =

OpenClaw: Open-Source Autonomous AI Agent , author =. 2025 , howpublished =

work page 2025
[27]

2025 , eprint=

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning , author=. 2025 , eprint=

work page 2025
[28]

Jailbreak Attacks and Defenses Against Large Language Models: A Survey

Jailbreak Attacks and Defenses Against Large Language Models: A Survey , author =. arXiv preprint arXiv:2407.04295 , year =

work page internal anchor Pith review arXiv
[29]

Conference on Neural Information Processing Systems (NeurIPS) , year=

CAMEL: Communicative Agents for ``Mind'' Exploration of Large Language Model Society , author=. Conference on Neural Information Processing Systems (NeurIPS) , year=

work page
[31]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Efficient Adversarial Training without Attacking: Worst-Case-Aware Robust Reinforcement Learning , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[32]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year=

Adversarial robust deep reinforcement learning requires redefining robustness , author=. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year=

work page
[33]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) , year =

Ben Abramowitz and Nicholas Mattei , title =. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) , year =

work page
[34]

IEEE Transactions on Intelligent Transportation Systems , year=

Deep Reinforcement Learning for Autonomous Driving: A Survey , author=. IEEE Transactions on Intelligent Transportation Systems , year=

work page
[35]

1957 , publisher=

Dynamic Programming , author=. 1957 , publisher=

work page 1957
[36]

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning , author=. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , year=

work page
[37]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Robust Physical-World Attacks on Deep Learning Visual Classification , author=. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page
[38]

International Conference on Learning Representations (ICLR) , year=

Illusory Attacks: Information-theoretic detectability matters in adversarial attacks , author=. International Conference on Learning Representations (ICLR) , year=

work page
[39]

International Conference on Machine Learning (ICML) , year=

Towards Optimal Adversarial Robust Q-learning with Bellman Infinity-error , author=. International Conference on Machine Learning (ICML) , year=

work page
[40]

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) , year=

Defending Observation Attacks In Deep Reinforcement Learning Via Detection And Denoising , author=. European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD) , year=

work page
[41]

International Conference on Machine Learning (ICML) , year=

Detecting Adversarial Directions in Deep Reinforcement Learning to Make Robust Decisions , author=. International Conference on Machine Learning (ICML) , year=

work page
[42]

Transactions on Machine Learning Research (TMLR) , year=

Robust Multi-Agent Reinforcement Learning with State Uncertainty , author=. Transactions on Machine Learning Research (TMLR) , year=

work page
[43]

Zhihe Yang and Yunjian Xu , booktitle=

work page
[44]

AAAI Conference on Artificial Intelligence (AAAI) , year=

Improve Robustness of Reinforcement Learning against Observation Perturbations via l_ Lipschitz Policy Networks , author=. AAAI Conference on Artificial Intelligence (AAAI) , year=

work page
[45]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Rethinking Lipschitz Neural Networks and Certified Robustness: A Boolean Function Perspective , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[46]

International Conference on Learning Representations (ICLR) , year=

GRAD: Game-Theoretical Defense against Temporally Coupled Attacks in Deep Reinforcement Learning , author=. International Conference on Learning Representations (ICLR) , year=

work page
[47]

International Conference on Learning Representations (ICLR) , year=

Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies , author=. International Conference on Learning Representations (ICLR) , year=

work page
[48]

ACM Transactions on Programming Languages and Systems , year=

The Byzantine Generals Problem , author=. ACM Transactions on Programming Languages and Systems , year=

work page
[49]

Sensors , VOLUME =

Wang, Jingyao and Deng, Xingming and Guo, Jinghua and Zeng, Zeqin , TITLE =. Sensors , VOLUME =. 2023 , NUMBER =

work page 2023
[50]

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation , author=. arXiv preprint arXiv:2308.08155 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Lee and A

Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems , author=. arXiv preprint arXiv:2410.07283 , year=

work page arXiv
[52]

arXiv preprint arXiv:2507.14928 , year=

Byzantine-Robust Decentralized Coordination of LLM Agents , author=. arXiv preprint arXiv:2507.14928 , year=

work page arXiv
[54]

Advances in neural information processing systems (NeurIPS) , year=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems (NeurIPS) , year=

work page
[55]

Decision and Game Theory for Security (GameSec) , year=

Deceptive Reinforcement Learning Under Adversarial Manipulations on Cost Signals , author=. Decision and Game Theory for Security (GameSec) , year=

work page
[56]

International Conference on Machine Learning (ICML) , year=

Adaptive Reward-Poisoning Attacks against Reinforcement Learning , author=. International Conference on Machine Learning (ICML) , year=

work page
[57]

AAAI Conference on Artificial Intelligence (AAAI) , year=

Spatiotemporally Constrained Action Space Attacks on Deep Reinforcement Learning Agents , author=. AAAI Conference on Artificial Intelligence (AAAI) , year=

work page
[58]

International Conference on Learning Representations (ICLR) , year=

Adversarial Policies: Attacking Deep Reinforcement Learning , author=. International Conference on Learning Representations (ICLR) , year=

work page
[59]

ACM Asia Conference on Computer and Communications Security (ASIACCS) , year=

Stealing Deep Reinforcement Learning Models for Fun and Profit , author=. ACM Asia Conference on Computer and Communications Security (ASIACCS) , year=

work page
[60]

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , year=

Malicious Attacks against Deep Reinforcement Learning Interpretations , author=. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) , year=

work page
[61]

Particle Filter Recurrent Neural Networks , booktitle =

Xiao Ma and P. Particle Filter Recurrent Neural Networks , booktitle =

work page
[62]

and Tsitsiklis, John N

Bertsekas, Dimitri P. and Tsitsiklis, John N. , title =. 1996 , isbn =

work page 1996
[63]

2013 , eprint=

Playing Atari with Deep Reinforcement Learning , author=. 2013 , eprint=

work page 2013
[64]

2023 , eprint=

GPT-4 Technical Report , author=. 2023 , eprint=

work page 2023
[65]

2023 , eprint=

Pandering in a Flexible Representative Democracy , author=. 2023 , eprint=

work page 2023
[66]

Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning , year=

Ilahi, Inaam and Usama, Muhammad and Qadir, Junaid and Janjua, Muhammad Umar and Al-Fuqaha, Ala and Hoang, Dinh Thai and Niyato, Dusit , journal=. Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning , year=

work page
[67]

Web Intelligence and Agent Systems: An international journal , volume=

Asymmetric multiagent reinforcement learning , author=. Web Intelligence and Agent Systems: An international journal , volume=. 2004 , publisher=

work page 2004
[68]

International Conference on Machine Learning (ICML) , year=

Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study , author=. International Conference on Machine Learning (ICML) , year=

work page
[69]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year=

Stackelberg actor-critic: Game-theoretic reinforcement learning algorithms , author=. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year=

work page
[70]

ICLR 2022 Workshop on Gamification and Multiagent Solutions , year=

Stackelberg Policy Gradient: Evaluating the Performance of Leaders and Followers , author=. ICLR 2022 Workshop on Gamification and Multiagent Solutions , year=

work page 2022
[71]

International Conference on Machine Learning (ICML) , year=

Flow-based recurrent belief state learning for pomdps , author=. International Conference on Machine Learning (ICML) , year=

work page
[72]

Nature , volume=

Outracing champion Gran Turismo drivers with deep reinforcement learning , author=. Nature , volume=. 2022 , publisher=

work page 2022
[73]

Adversarial attacks on neural network policies , author=

work page
[74]

OpenAI Gym

Openai gym , author=. arXiv preprint arXiv:1606.01540 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[75]

Proceedings of the IEEE/CVF International Conference on Computer Vision , year=

Scalable verified training for provably robust image classification , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , year=

work page
[76]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Robust deep reinforcement learning through adversarial loss , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[77]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Provable Defense against Backdoor Policies in Reinforcement Learning , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[78]

Journal of mathematical analysis and applications , volume=

Optimal control of Markov processes with incomplete state information , author=. Journal of mathematical analysis and applications , volume=

work page
[79]

Journal of artificial intelligence research , volume=

Finding approximate POMDP solutions through belief compression , author=. Journal of artificial intelligence research , volume=

work page
[80]

Advances in Neural Information Processing Systems (NeurIPS) , year=

Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[81]

Variational Inference for Data-Efficient Model Learning in POMDPs

Variational inference for data-efficient model learning in pomdps , author=. arXiv preprint arXiv:1805.09281 , year=

work page Pith review arXiv
[82]

International Conference on Learning Representations (ICLR) , year=

Mastering Atari with Discrete World Models , author=. International Conference on Learning Representations (ICLR) , year=

work page

Showing first 80 references.