pith. machine review for the scientific record. sign in

arxiv: 2605.06158 · v1 · submitted 2026-05-07 · 💻 cs.CR

Recognition: unknown

Stateful Agent Backdoor

Honglong Chen, Jiaxiong Tang, Liantao Wu, Peng Sun, Zhengchunmin Dai

Authors on Pith no claims yet

Pith reviewed 2026-05-08 09:19 UTC · model grok-4.3

classification 💻 cs.CR
keywords backdoor attackLLM agentstateful attackMealy machinesecurity vulnerabilitypersistent components
0
0 comments X

The pith

Stateful backdoors enable LLM agents to execute incremental attacks across multiple sessions after a single trigger injection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes extending backdoor attacks on LLM-based agents from stateless single-session behaviors to stateful ones that persist across sessions. It models the attack as a Mealy machine to decompose it into independent transitions, each with its own trigger data. This allows the attack to maintain state using persistent components under permission isolation, achieving high success rates. A sympathetic reader would care because real-world agents interact over multiple sessions, making stateless backdoors insufficient for long-term compromise.

Core claim

We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation by maintaining state through persistent components, enabling autonomous, incremental execution following a one-time trigger. We model the attack as a Mealy machine and derive a decomposition framework for independent per-transition data construction, instantiated with a primary attack achieving 80-95% success across models.

What carries the argument

A decomposition framework derived from modeling the backdoor as a Mealy machine, allowing independent construction of data for each transition in the attack sequence.

If this is right

  • The primary instantiation achieves 80% to 95% attack success rate across four different models.
  • Per-transition analysis confirms the effectiveness of the decomposition approach.
  • Extensibility variants using alternative topologies and persistent components maintain consistent effectiveness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach highlights potential vulnerabilities in agent systems that allow persistent storage across sessions.
  • Developers of LLM agents may need to implement stricter isolation or reset mechanisms for persistent components.
  • Future attacks could explore more complex state machines for longer attack sequences.

Load-bearing premise

Persistent components can reliably maintain attack state across sessions under permission isolation without detection or reset by the system.

What would settle it

A test where the system enforces permission isolation or resets persistent storage between sessions and checks if the attack state is lost, causing the incremental execution to fail.

Figures

Figures reproduced from arXiv: 2605.06158 by Honglong Chen, Jiaxiong Tang, Liantao Wu, Peng Sun, Zhengchunmin Dai.

Figure 1
Figure 1. Figure 1: Overview of the stateful agent backdoor. Each row represents a session in which the agent view at source ↗
Figure 2
Figure 2. Figure 2: State transition diagram of the Mealy machine view at source ↗
Figure 3
Figure 3. Figure 3: State transition diagram of the Mealy machine for the branch-and-merge instantiation. view at source ↗
read the original abstract

Existing backdoor attacks on Large Language Model-based agents remain stateless, executing fixed behaviors confined to a single session. We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation. The attack maintains state through persistent components, enabling autonomous, incremental execution across sessions following a one-time trigger injection. Formally, we model the attack as a Mealy machine and derive a decomposition framework that enables independent per-transition data construction. We instantiate this framework with a primary attack and two extensibility variants. The primary instantiation achieves an attack success rate of 80\%--95\% across four models, with per-transition analysis demonstrating the effectiveness of the decomposition. Extensibility variants with alternative topologies and persistent components demonstrate consistent effectiveness. Code and data are available at https://anonymous.4open.science/r/stateful_agent_backdoor-E89F.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a stateful backdoor attack on LLM-based agents that persists across multiple sessions via persistent components under permission isolation, unlike prior stateless backdoors. It models the attack as a Mealy machine and derives a decomposition framework enabling independent per-transition data construction. The primary instantiation reports 80-95% attack success rates across four models, with per-transition analysis supporting the decomposition's effectiveness; two extensibility variants using alternative topologies and persistent components are also evaluated and show consistent results. Code and data are released for reproducibility.

Significance. If the empirical results hold, the work meaningfully extends backdoor research by demonstrating how state persistence can enable incremental, autonomous attacks across sessions in agent systems. The Mealy-machine formalization and decomposition provide a structured, reusable construction method, and the code release supports verification and extension. This could inform defenses for multi-session agent deployments where stateful threats have not been a primary focus.

major comments (2)
  1. [Experimental Evaluation] The central empirical claim of 80-95% ASR relies on the per-transition analysis demonstrating decomposition effectiveness; the manuscript should explicitly report the number of transitions tested, the exact data-construction procedure per transition, and any statistical controls for variance across sessions to confirm the independence assumption is not violated by agent memory or context carry-over.
  2. [Threat Model and Persistence Mechanism] The weakest assumption noted in the threat model—that persistent components reliably survive permission isolation without reset or detection—is load-bearing for the multi-session claim; the experiments should include an ablation or failure-mode analysis showing what happens when the persistent component is cleared or monitored between sessions.
minor comments (2)
  1. [Abstract] The abstract mentions four models and 80-95% ASR but does not name the models or briefly note the baseline comparison (e.g., stateless backdoors); adding one sentence would improve immediate context.
  2. [Formal Modeling] Notation for the Mealy-machine states and transitions could be introduced with a small diagram or table in the formal section to make the decomposition mapping clearer to readers unfamiliar with automata.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review of our manuscript. We address the major comments point by point below and agree to incorporate revisions where appropriate to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Experimental Evaluation] The central empirical claim of 80-95% ASR relies on the per-transition analysis demonstrating decomposition effectiveness; the manuscript should explicitly report the number of transitions tested, the exact data-construction procedure per transition, and any statistical controls for variance across sessions to confirm the independence assumption is not violated by agent memory or context carry-over.

    Authors: We agree that these details are necessary to fully support the claims. In the revised manuscript, we will explicitly report the number of transitions tested, provide the precise data-construction procedure used for each transition, and present statistical results from repeated experiments across sessions to verify that the independence assumption holds and that there is no significant variance due to memory carry-over. revision: yes

  2. Referee: [Threat Model and Persistence Mechanism] The weakest assumption noted in the threat model—that persistent components reliably survive permission isolation without reset or detection—is load-bearing for the multi-session claim; the experiments should include an ablation or failure-mode analysis showing what happens when the persistent component is cleared or monitored between sessions.

    Authors: This is a valid point about the assumptions in the threat model. We will add an ablation study in the revised manuscript that analyzes the attack success when the persistent component is cleared between sessions, showing the necessity of state persistence for the incremental attack. Regarding monitoring, we will discuss it as a potential defense direction in the limitations section. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical security construction: it models the attack as a Mealy machine, derives a decomposition to support per-transition data construction, then instantiates the framework, measures 80-95% ASR on four models, and releases code. No equations, fitted parameters, or self-citations are shown to reduce the reported success rates or the decomposition's effectiveness to inputs by construction. The central results are externally falsifiable experimental outcomes rather than tautological re-statements of the modeling assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on modeling agents as Mealy machines to enable decomposition and on the existence of usable persistent components under permission isolation.

axioms (1)
  • domain assumption LLM-based agents can be modeled as Mealy machines whose transitions can be independently triggered and constructed.
    Invoked to derive the decomposition framework for per-transition data construction.
invented entities (1)
  • Stateful agent backdoor no independent evidence
    purpose: To maintain and incrementally execute malicious behavior across isolated sessions
    The attack mechanism is defined and instantiated in the paper.

pith-pipeline@v0.9.0 · 5443 in / 1150 out tokens · 53400 ms · 2026-05-08T09:19:50.132947+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 18 canonical work pages · 9 internal anchors

  1. [1]

    ReST meets react: Self-improvement for multi-step reasoning LLM agent

    Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, and Sanjiv Kumar. ReST meets react: Self-improvement for multi-step reasoning LLM agent. InICLR 2024 Workshop on Large Language Model (LLM) Agents, 2024. URL https:// openreview...

  2. [2]

    MAIN- RAG: Multi-agent filtering retrieval-augmented generation

    Chia-Yuan Chang, Zhimeng Jiang, Vineeth Rakesh, Menghai Pan, Chin-Chia Michael Yeh, Guanchu Wang, Mingzhi Hu, Zhichao Xu, Yan Zheng, Mahashweta Das, and Na Zou. MAIN- RAG: Multi-agent filtering retrieval-augmented generation. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar, editors,Proceedings of the 63rd Annual Meeting of t...

  3. [3]

    Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases

    Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. Agentpoison: Red-teaming LLM agents via poisoning memory or knowledge bases. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum? id=Y841BRW9rY

  4. [4]

    TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models, 2024

    Pengzhou Cheng, Yidong Ding, Tianjie Ju, Zongru Wu, Wei Du, Haodong Zhao, Ping Yi, Zhuosheng Zhang, and Gongshen Liu. TrojanRAG: Retrieval-augmented generation can be backdoor driver in large language models, 2024. URL https://openreview.net/forum? id=RfYD6v829Y

  5. [5]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025

  6. [6]

    CrewAI Inc. CrewAI. https://github.com/crewAIInc/crewAI, 2026. Version 1.14.4; accessed May 6, 2026

  7. [7]

    Deepseek-v4: Towards highly efficient million-token context intelligence,

    DeepSeek-AI. Deepseek-v4: Towards highly efficient million-token context intelligence,

  8. [8]

    Accessed: May 6, 2026

    URL https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro/blob/main/ DeepSeek_V4.pdf. Accessed: May 6, 2026

  9. [9]

    QLoRA: Efficient finetuning of quantized LLMs

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. QLoRA: Efficient finetuning of quantized LLMs. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URLhttps://openreview.net/forum?id=OUIFPHEgJU

  10. [10]

    Memory injection attacks on LLM agents via query-only interaction

    Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. Memory injection attacks on LLM agents via query-only interaction. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https: //openreview.net/forum?id=QINnsnppv8

  11. [11]

    BackdoorAgent: A unified framework for backdoor attacks on LLM-based agents.arXiv preprintarXiv:2601.04566, 2026

    Yunhao Feng, Yige Li, Yutao Wu, Yingshui Tan, Yanming Guo, Yifan Ding, Kun Zhai, Xingjun Ma, and Yu-Gang Jiang. Backdooragent: A unified framework for backdoor attacks on llm-based agents, 2026. URL https://arxiv.org/abs/2601.04566. arXiv preprint arXiv:2601.04566

  12. [12]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. Retrieval-augmented generation for large language models: A survey, 2023. URLhttps://arxiv.org/abs/2312.10997. 10

  13. [13]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024. URL https://arxiv.org/abs/ 2407.21783

  14. [14]

    Memo: Training memory-efficient embodied agents with reinforcement learning

    Gunshi Gupta, Karmesh Yadav, Zsolt Kira, Yarin Gal, and Rahaf Aljundi. Memo: Training memory-efficient embodied agents with reinforcement learning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URL https://openreview. net/forum?id=9eIntNc69t

  15. [15]

    Measuring massive multitask language understanding

    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum? id=d7KBjmI3GmQ

  16. [16]

    LangGraph SDK for Python

    LangChain AI. LangGraph SDK for Python. https://github.com/langchain-ai/ langgraph/tree/main/libs/sdk-py, 2026. Version 0.3.14; accessed May 6, 2026

  17. [17]

    Ministral 3

    Alexander H. Liu, Kartik Khandelwal, Sandeep Subramanian, Victor Jouault, Abhinav Rastogi, et al. Ministral 3.arXiv preprint arXiv:2601.08584, 2026

  18. [18]

    Decoupled weight decay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7

  19. [19]

    Evaluating very long-term conversational memory of LLM agents

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. Evaluating very long-term conversational memory of LLM agents. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 13851–13870, Bangkok, Thailand, 2024. Association for Computational Lingui...

  20. [20]

    George H. Mealy. A method for synthesizing sequential circuits.Bell System Technical Journal, 34(5):1045–1079, 1955. doi: 10.1002/j.1538-7305.1955.tb03788.x

  21. [21]

    Memory in microsoft foundry agent service (preview), 2026

    Microsoft. Memory in microsoft foundry agent service (preview), 2026. URL https://learn. microsoft.com/en-us/azure/foundry/agents/concepts/what-is-memory . Ac- cessed: May 6, 2026

  22. [22]

    Microsoft Agent Framework for Python

    Microsoft. Microsoft Agent Framework for Python. https://github.com/microsoft/ agent-framework/, 2026. Version python-1.2.2; accessed May 6, 2026

  23. [23]

    Evaluation and benchmark- ing of llm agents: A survey

    Mahmoud Mohammadi, Yipeng Li, Jane Lo, and Wendy Yip. Evaluation and benchmarking of llm agents: A survey. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V .2, KDD ’25, page 6129–6139, New York, NY , USA, 2025. Association for Computing Machinery. ISBN 9798400714542. doi: 10.1145/3711896.3736570. URLhttps://doi.org/...

  24. [24]

    OpenAI GPT-5 System Card

    OpenAI. Gpt-5 system card, 2026. URL https://arxiv.org/abs/2601.03267. arXiv preprint arXiv:2601.03267

  25. [25]

    OpenAI Agents SDK for Python

    OpenAI. OpenAI Agents SDK for Python. https://github.com/openai/ openai-agents-python, 2026. Version 0.15.3; accessed May 6, 2026

  26. [26]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Sarah Wooders, Kevin Lin, Vivian Fang, Shishir G. Patil, Ion Stoica, and Joseph E. Gonzalez. Memgpt: Towards llms as operating systems, 2023. URL https: //arxiv.org/abs/2310.08560. arXiv preprint arXiv:2310.08560

  27. [27]

    Qwen2.5 Technical Report

    Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, et al. Qwen2.5 technical report.arXiv preprint arXiv:2412.15115, 2024

  28. [28]

    Available: https://doi.org/10.1109/SP54263.2024.00254

    Mati Ur Rehman, Hadi Ahmadi, and Wajih Ul Hassan. Flash: A comprehensive approach to intrusion detection via provenance graph representation learning. In2024 IEEE Symposium on Security and Privacy (SP), pages 3552–3570, 2024. doi: 10.1109/SP54263.2024.00139. 11

  29. [29]

    Expert insights into advanced persistent threats: Analysis, attribution, and challenges

    Aakanksha Saha, James Mattei, Jorge Blasco, Lorenzo Cavallaro, Daniel V otipka, and Mar- tina Lindorfer. Expert insights into advanced persistent threats: Analysis, attribution, and challenges. In34th USENIX Security Symposium (USENIX Security 25), pages 2185–2204, Seattle, WA, 2025. USENIX Association. URL https://www.usenix.org/conference/ usenixsecurit...

  30. [30]

    BadAgent: Inserting and activating backdoor attacks in LLM agents

    Yifei Wang, Dizhan Xue, Shengjie Zhang, and Shengsheng Qian. BadAgent: Inserting and activating backdoor attacks in LLM agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Proceedings of the 62nd Annual Meeting of the Association for Compu- tational Linguistics (V olume 1: Long Papers), pages 9811–9827, Bangkok, Thailand, August

  31. [31]

    doi: 10.18653/v1/2024.acl-long.530

    Association for Computational Linguistics. doi: 10.18653/v1/2024.acl-long.530. URL https://aclanthology.org/2024.acl-long.530/

  32. [32]

    A-mem: Agentic memory for LLM agents

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. A-mem: Agentic memory for LLM agents. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=FiM0M8gcct

  33. [33]

    Qwen3 Technical Report

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, ...

  34. [34]

    Watch out for your agents! investigating backdoor threats to llm-based agents

    Wenkai Yang, Xiaohan Bi, Yankai Lin, Sishuo Chen, Jie Zhou, and Xu Sun. Watch out for your agents! investigating backdoor threats to llm-based agents. InAdvances in Neu- ral Information Processing Systems, volume 37, pages 100938–100964. Curran Associates, Inc., 2024. URL https://proceedings.neurips.cc/paper_files/paper/2024/file/ b6e9d6f4f3428cd5f3f9e9bb...

  35. [35]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2210.03629

  36. [36]

    A survey on advanced persistent threat detection: A unified framework, challenges, and countermeasures

    Bo Zhang, Yansong Gao, Boyu Kuang, Changlong Yu, Anmin Fu, and Willy Susilo. A survey on advanced persistent threat detection: A unified framework, challenges, and countermeasures. ACM Computing Surveys, 57(3), 2024. doi: 10.1145/3700749. URL https://doi.org/10. 1145/3700749

  37. [37]

    Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents

    Hanrong Zhang, Jingyuan Huang, Kai Mei, Yifei Yao, Zhenting Wang, Chenlu Zhan, Hongwei Wang, and Yongfeng Zhang. Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=V4y0CpX4hK

  38. [38]

    Demon- Agent: Dynamically encrypted multi-backdoor implantation attack on LLM-based agent

    Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, and Sen Su. Demon- Agent: Dynamically encrypted multi-backdoor implantation attack on LLM-based agent. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors, Findings of the Association for Computational Linguistics: EMNLP 2025, pages 2890–2912, Suzhou, C...

  39. [39]

    react” as a trigger, storing “react:attack-state

    Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: knowledge corruption attacks to retrieval-augmented generation of large language models. InProceedings of the 34th USENIX Conference on Security Symposium, SEC ’25, USA, 2025. USENIX Association. ISBN 978-1-939133-52-6. 12 A Formal Definition of the Mealy Machine M= (S,Σ,Λ, δ, λ, s init) M...

  40. [40]

    Guidelines: • The answer [N/A] means that the paper does not involve crowdsourcing nor research with human subjects

    Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...