arxiv: 2604.16543 · v1 · submitted 2026-04-17 · 💻 cs.MA · cs.AI

Recognition: unknown

Conjunctive Prompt Attacks in Multi-Agent LLM Systems

Mengxin Zheng, Nokimul Hasan Arif, Qian Lou

Pith reviewed 2026-05-10 08:10 UTC · model grok-4.3

classification 💻 cs.MA cs.AI

keywords conjunctive prompt attacksmulti-agent LLMsprompt injectionLLM safetyagent routingadversarial templates

0 comments

The pith

A trigger in the user query and a hidden template in one remote agent activate harmful LLM behavior only when routing combines them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that multi-agent LLM systems introduce a new attack surface because prompts can be split into individually benign pieces that become dangerous only after inter-agent routing joins them. An attacker who controls only a remote agent and the placement of a short trigger can use routing-aware optimization to raise attack success rates across star, chain, and DAG topologies while keeping false activations low. Existing single-component guards such as PromptGuard and Llama-Guard variants fail because no isolated message or agent looks malicious. The work therefore claims that safety mechanisms must reason over routing paths and cross-agent composition rather than inspecting parts in isolation.

Core claim

Conjunctive prompt attacks succeed by placing a benign trigger key in the user query and a hidden adversarial template inside one compromised remote agent; the two pieces appear harmless separately yet produce harmful outputs once the system's routing mechanism brings them together. Routing-aware optimization of the template and trigger placement raises success rates over non-optimized baselines in star, chain, and DAG topologies while maintaining low false-positive rates. Current defenses fail because no single component triggers detection.

What carries the argument

Conjunctive prompt attack: a split malicious instruction consisting of a trigger key in the user query and an adversarial template in one remote agent that only activates when the routing layer combines them.

If this is right

Routing-aware optimization increases attack success across star, chain, and DAG topologies.
False activation rates remain low for the optimized attacks.
PromptGuard, Llama-Guard variants, and tool-restriction defenses do not reliably block the attacks.
Agentic LLM pipelines contain a structural vulnerability that requires defenses operating over routing and cross-agent composition.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses would need to inspect message flows between agents rather than individual prompts or agents.
The same split-instruction pattern could be tested in other multi-component systems where parts are routed by a central coordinator.
Varying the routing algorithm itself might serve as a practical countermeasure worth measuring in follow-up work.

Load-bearing premise

An attacker can insert a hidden adversarial template into one remote agent and choose a trigger such that the system's routing will reliably join the two pieces without either part being flagged as suspicious on its own.

What would settle it

A controlled experiment that measures whether a routing-optimized conjunctive attack produces substantially higher success rates than a non-optimized baseline while false activations stay below a stated threshold in the same multi-agent topologies.

Figures

Figures reproduced from arXiv: 2604.16543 by Mengxin Zheng, Nokimul Hasan Arif, Qian Lou.

**Figure 1.** Figure 1: Normal multi-agent LLM pipeline without adversarial manipulation. A user interacts with a client (orchestrator) agent that decomposes the request into subtasks and routes them to specialized remote agents (e.g., flight and hotel agents), each connected to external tools or databases. Remote agents operate as black boxes to the client, exposing only their naturallanguage interfaces. into subtasks, routes t… view at source ↗

**Figure 2.** Figure 2: End-to-end pipeline for conjunctive attack in multi-agent systems. Left: The attacker learns a prompt-level configuration θ ∗ over key placement, template placement, and routing bias using a differentiable counterpart objective. Right: At inference time, a single key-bearing query is routed through the multi-agent system; activation occurs only when the key-bearing segment reaches the compromised agent and… view at source ↗

**Figure 3.** Figure 3: Detection efficacy of different safety mecha [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Most LLM safety work studies single-agent models, but many real applications rely on multiple interacting agents. In these systems, prompt segmentation and inter-agent routing create attack surfaces that single-agent evaluations miss. We study \emph{conjunctive prompt attacks}, where a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone but activate harmful behavior when routing brings them together. We consider an attacker who changes neither model weights nor the client agent and instead controls only trigger placement and template insertion. Across star, chain, and DAG topologies, routing-aware optimization substantially increases attack success over non-optimized baselines while keeping false activations low. Existing defenses, including PromptGuard, Llama-Guard variants, and system-level controls such as tool restrictions, do not reliably stop the attack because no single component appears malicious in isolation. These results expose a structural vulnerability in agentic LLM pipelines and motivate defenses that reason over routing and cross-agent composition. Code is available at https://github.com/UCF-ML-Research/ConjunctiveAgents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The conjunctive attack idea splits malicious prompts across user input and a remote agent to evade single-agent checks, but the results rest on thin experimental details and optimistic assumptions about attacker knowledge of routing.

read the letter

The main thing to know is that this paper frames a conjunctive prompt attack where a benign trigger in the user query combines with a hidden template in one remote agent only when routing brings them together. That setup targets a gap in single-agent safety evaluations for multi-agent LLM systems across star, chain, and DAG topologies. Routing-aware optimization is presented as the way to raise success rates while holding false activations down, and the work checks that common defenses like PromptGuard and Llama-Guard variants miss it because no isolated component triggers them. Code is released, which helps with checking the claims directly. That is the actual new angle beyond standard prompt injection literature. It does a reasonable job spelling out the attacker model where only trigger placement and template insertion are controlled, without touching weights or the client agent. The topology coverage and defense tests give a concrete starting point for thinking about composition risks in agent pipelines. The soft spots sit in the evidence and assumptions. The abstract states that optimization substantially boosts success and that defenses fail, yet it gives no numbers, baselines, or method specifics on how success and false positives were measured. Without those, the strength of the headline result is difficult to assess. The optimization step also implies the attacker has enough knowledge of routing dynamics and agent states to place the template effectively. If routing is internal, adaptive, or opaque, that knowledge may require probing that risks detection itself, and the paper does not appear to demonstrate how the attacker acquires it in a realistic black-box setting. This limits how far the practical vulnerability claim travels. The work is aimed at AI safety researchers who already work on agentic systems and prompt attacks. A reader focused on multi-agent vulnerabilities would pick up the concept and the topology experiments as useful prompts for their own thinking, even if they treat the current numbers as preliminary. It deserves a serious referee because the structural point about cross-agent composition is worth checking against existing evaluations, though the review would need to press for expanded methods, quantitative tables, and discussion of the routing-knowledge assumption. I would send it to peer review with those requests rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces conjunctive prompt attacks in multi-agent LLM systems, where a benign trigger key in the user query and a hidden adversarial template inserted into one remote agent appear non-malicious in isolation but activate harmful behavior when inter-agent routing combines them in context. The attacker is limited to controlling trigger placement and template insertion (no model weight changes or client agent control). Experiments across star, chain, and DAG topologies show that routing-aware optimization of the template substantially raises attack success rates relative to non-optimized baselines while maintaining low false activations. Standard defenses (PromptGuard, Llama-Guard variants, tool restrictions) are shown to be ineffective because no individual component triggers detection.

Significance. If the empirical claims hold under a realistic threat model, the work identifies a structural vulnerability in multi-agent LLM pipelines that single-agent safety evaluations overlook, motivating routing-aware defenses. The public code release supports reproducibility and verification of the reported attack success and defense failure rates.

major comments (2)

[Abstract] Abstract and threat model: the central claim that 'routing-aware optimization substantially increases attack success' requires the attacker to possess foreknowledge of the routing function, agent identities, and message-passing rules to perform the optimization. This contradicts the stated attacker capabilities of controlling 'only trigger placement and template insertion' without additional probing or internal access. The assumption is load-bearing for the headline result and must be justified with a concrete black-box procedure that does not itself produce detectable false activations.
[Defense Evaluation] Defense evaluation section: the assertion that existing defenses 'do not reliably stop the attack' because 'no single component appears malicious in isolation' requires quantitative metrics (e.g., attack success rates with and without each defense, false-positive rates) and explicit baselines. Without these, the claim that defenses fail cannot be assessed for magnitude or robustness.

minor comments (1)

The abstract would be strengthened by including at least one key quantitative result (e.g., success-rate delta or false-activation rate) to convey the scale of the reported improvement.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below, clarifying the threat model and strengthening the defense evaluation with additional quantitative results. Revisions have been made to the manuscript to improve clarity and rigor without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract and threat model: the central claim that 'routing-aware optimization substantially increases attack success' requires the attacker to possess foreknowledge of the routing function, agent identities, and message-passing rules to perform the optimization. This contradicts the stated attacker capabilities of controlling 'only trigger placement and template insertion' without additional probing or internal access. The assumption is load-bearing for the headline result and must be justified with a concrete black-box procedure that does not itself produce detectable false activations.

Authors: The threat model already implies knowledge of agent identities and routing because the attacker must choose a specific remote agent in which to insert the template; this selection presupposes awareness of the system topology and message-passing rules. Routing-aware optimization is performed entirely offline using this structural knowledge together with a local surrogate evaluator (no queries to the live target system are required during optimization). We have revised the abstract and added a dedicated paragraph in the threat model section that explicitly states this assumption and provides pseudocode for the black-box optimization procedure, which relies only on known topology and benign simulation queries that produce no detectable activations in the target deployment. revision: yes
Referee: [Defense Evaluation] Defense evaluation section: the assertion that existing defenses 'do not reliably stop the attack' because 'no single component appears malicious in isolation' requires quantitative metrics (e.g., attack success rates with and without each defense, false-positive rates) and explicit baselines. Without these, the claim that defenses fail cannot be assessed for magnitude or robustness.

Authors: We agree that quantitative metrics are essential. The revised manuscript now includes a new table (Table 4) in the defense evaluation section that reports attack success rate (ASR) and false-positive rate (FPR) for PromptGuard, two Llama-Guard variants, and tool-restriction baselines, both with and without the conjunctive attack, across all three topologies. We also add single-agent attack and random-template baselines for comparison. The results confirm that isolated defenses reduce ASR for non-conjunctive attacks but leave the conjunctive attack with ASR above 65% and FPR below 4% in all cases, supporting the original claim with measurable effect sizes. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical multi-agent attack evaluation

full rationale

The paper presents an empirical study of conjunctive prompt attacks across star, chain, and DAG topologies in multi-agent LLM systems. All central claims (increased attack success via routing-aware optimization, low false activations, and ineffectiveness of existing defenses) are framed as direct experimental outcomes from simulations and tests rather than mathematical derivations, first-principles predictions, or quantities defined in terms of fitted inputs. No equations, self-definitional constructs, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described content. The work is self-contained against external benchmarks via reported experimental results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The study is purely empirical with no mathematical derivations, free parameters, axioms, or invented entities described.

pith-pipeline@v0.9.0 · 5479 in / 1281 out tokens · 54604 ms · 2026-05-10T08:10:43.382078+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 38 canonical work pages · 9 internal anchors

[1]

Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan Solihin, and Qian Lou. 2024. Jailbreaking llms with arabic transliteration and arabizi. In Proceedings of the 2024 conference on empirical methods in natural language processing, pages 18584--18600

2024
[2]

Mansour Al Ghanim, Muhammad Santriaji, Qian Lou, and Yan Solihin. 2023. Trojbits: A hardware aware inference-time attack on transformer-based language models. In ECAI 2023, pages 60--68. IOS Press

2023
[3]

Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, and Shouling Ji. 2025. https://arxiv.org/abs/2508.15310 Ipiguard: A novel tool dependency graph-based defense against indirect prompt injection in llm agents . Preprint, arXiv:2508.15310

work page arXiv 2025
[4]

Shubhi Asthana, Bing Zhang, Chad DeLuca, Ruchi Mahindru, and Hima Patel. 2025. https://arxiv.org/abs/2512.02228 Stride: A systematic framework for selecting ai modalities -- agentic ai, ai assistants, or llm calls . Preprint, arXiv:2512.02228

work page arXiv 2025
[5]

Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, and Alexandre Drouin. 2025. https://arxiv.org/abs/2510.05159 Malice in agentland: Down the rabbit hole of backdoors in the ai supply chain . Preprint, arXiv:2510.05159

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

Pengfei Du. 2025. https://arxiv.org/abs/2503.20028 Omninova:a general multimodal agent framework . Preprint, arXiv:2503.20028

work page arXiv 2025
[7]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. https://arxiv.org/abs/2302.12173 Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection . Preprint, arXiv:2302.12173

work page internal anchor Pith review arXiv 2023
[9]

Itay Hazan, Yael Mathov, Guy Shtar, Ron Bitton, and Itsik Mantin. 2025. https://arxiv.org/abs/2511.18114 Astra: Agentic steerability and risk assessment framework . Preprint, arXiv:2511.18114

work page arXiv 2025
[10]

S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, and Jungpil Shin. 2025. https://arxiv.org/abs/2509.14285 A multi-agent llm defense pipeline against prompt injection attacks . Preprint, arXiv:2509.14285

work page arXiv 2025
[11]

Yen-Chang Hsu, Ting Hua, Sungen Chang, Qian Lou, Yilin Shen, and Hongxia Jin. 2022. Language model compression with weighted low-rank factorization. In International Conference on Learning Representations (ICLR 2022)

2022
[12]

Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, and Madian Khabsa. 2023. https://arxiv.org/abs/2312.06674 Llama guard: Llm-based input-output safeguard for human-ai conversations . Preprint, arXiv:2312.06674

work page internal anchor Pith review arXiv 2023
[13]

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. https://arxiv.org/abs/2310.0...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[14]

Ishan Kavathekar, Hemang Jain, Ameya Rathod, Ponnurangam Kumaraguru, and Tanuja Ganu. 2025. https://arxiv.org/abs/2511.05269 Tamas: Benchmarking adversarial risks in multi-agent llm systems . Preprint, arXiv:2511.05269

work page arXiv 2025
[15]

Klaudia Krawiecka and Christian Schroeder de Witt. 2025. https://arxiv.org/abs/2508.09815 Extending the owasp multi-agentic system threat modeling guide: Insights from multi-agent security research . Preprint, arXiv:2508.09815

work page arXiv 2025
[16]

Tomasz Kuśmierczyk and Arto Klami. 2021. https://arxiv.org/abs/2006.15568 Reliable categorical variational inference with mixture of discrete normalizing flows . Preprint, arXiv:2006.15568

work page arXiv 2021
[17]

Boyi Li, Zhonghan Zhao, Der-Horng Lee, and Gaoang Wang. 2025 a . https://arxiv.org/abs/2506.02951 Adaptive graph pruning for multi-agent communication . Preprint, arXiv:2506.02951

work page arXiv 2025
[18]

Hao Li and Xiaogeng Liu. 2025. https://arxiv.org/abs/2410.22770 Injecguard: Benchmarking and mitigating over-defense in prompt injection guardrail models . Preprint, arXiv:2410.22770

work page arXiv 2025
[19]

Hao Li, Xiaogeng Liu, Ning Zhang, and Chaowei Xiao. 2025 b . https://doi.org/10.18653/v1/2025.acl-long.1468 PIG uard: Prompt injection guardrail via mitigating overdefense for free . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30420--30437, Vienna, Austria. Association for Compu...

work page doi:10.18653/v1/2025.acl-long.1468 2025
[20]

Yang Li, Siqi Ping, Xiyu Chen, Xiaojian Qi, Zigan Wang, Ye Luo, and Xiaowei Zhang. 2025 c . https://arxiv.org/abs/2511.00628 Agentgit: A version control framework for reliable and scalable llm-powered multi-agent systems . Preprint, arXiv:2511.00628

work page arXiv 2025
[21]

Ruichao Liang, Le Yin, Jing Chen, Cong Wu, Xiaoyu Zhang, Huangpeng Gu, Zijian Zhang, and Yang Liu. 2025. https://arxiv.org/abs/2512.04129 Tipping the dominos: Topology-aware multi-hop attacks on llm-based multi-agent systems . Preprint, arXiv:2512.04129

work page internal anchor Pith review Pith/arXiv arXiv 2025
[22]

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, Shudan Zhang, Xiang Deng, Aohan Zeng, Zhengxiao Du, Chenhui Zhang, Sheng Shen, Tianjun Zhang, Yu Su, Huan Sun, and 3 others. 2024. https://proceedings.iclr.cc/paper_files/paper/2024/file/e9df36b21ff4ee211a8b71ee8b7e9f57-Paper-Conference.pdf Ag...

2024
[23]

Qian Lou, Yen-Chang Hsu, Burak Uzkent, Ting Hua, Yilin Shen, and Hongxia Jin. 2022 a . Lite-mdetr: A lightweight multi-modal detector. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2022)

2022
[24]

Qian Lou, Ting Hua, Yen-Chang Hsu, Yilin Shen, and Hongxia Jin. 2022 b . Dictformer: Tiny transformer with shared dictionary. In International Conference on Learning Representations (ICLR 2022)

2022
[25]

Qian Lou, Xin Liang, Jiaqi Xue, Yancheng Zhang, Rui Xie, and Mengxin Zheng. 2024. Cr-utp: Certified robustness against universal text perturbations on large language models. In Findings of the Association for Computational Linguistics: ACL 2024, pages 9863--9875

2024
[26]

Qian Lou, Yepeng Liu, and Bo Feng. 2023. Trojtext: Test-time invisible textual trojan insertion. arXiv preprint arXiv:2303.02242

work page arXiv 2023
[27]

Meta. 2024. https://www.llama.com/docs/model-cards-and-prompt-formats/prompt-guard/ Llama prompt guard 2: Model cards and prompt formats

2024
[28]

Kanghua Mo, Li Hu, Yucheng Long, and Zhihao Li. 2025. https://arxiv.org/abs/2508.02110 Attractive metadata attack: Inducing llm agents to invoke malicious tools . Preprint, arXiv:2508.02110

work page arXiv 2025
[29]

Pavlos Ntais. 2025. https://arxiv.org/abs/2510.22085 Jailbreak mimicry: Automated discovery of narrative-based jailbreaks for large language models . Preprint, arXiv:2510.22085

work page arXiv 2025
[30]

Salman Rahman, Liwei Jiang, James Shiffer, Genglin Liu, Sheriff Issaka, Md Rizwan Parvez, Hamid Palangi, Kai-Wei Chang, Yejin Choi, and Saadia Gabriel. 2025. https://arxiv.org/abs/2504.13203 X-teaming: Multi-turn jailbreaks and defenses with adaptive multi-agents . Preprint, arXiv:2504.13203

work page arXiv 2025
[31]

Aniruddha Roy, Pretam Ray, Abhilash Nandy, Somak Aditya, and Pawan Goyal. 2025. https://arxiv.org/abs/2505.06548 Refine-af: A task-agnostic framework to align language models via self-generated instructions using reinforcement learning from automated feedback . Preprint, arXiv:2505.06548

work page arXiv 2025
[32]

Rushi Shah, Mingyuan Yan, Michael Curtis Mozer, and Dianbo Liu. 2026. https://arxiv.org/abs/2410.13331 Improving discrete optimisation via decoupled straight-through estimator . Preprint, arXiv:2410.13331

work page arXiv 2026
[33]

Hakim Sidahmed, Samrat Phatale, Alex Hutcheson, Zhuonan Lin, Zhang Chen, Zac Yu, Jarvis Jin, Simral Chaudhary, Roman Komarytsia, Christiane Ahlheim, Yonghao Zhu, Bowen Li, Saravanan Ganesh, Bill Byrne, Jessica Hoffmann, Hassan Mansoor, Wei Li, Abhinav Rastogi, and Lucas Dixon. 2024. https://arxiv.org/abs/2403.10704 Parameter efficient reinforcement learni...

work page arXiv 2024
[34]

Zhen Tan, Chengshuai Zhao, Raha Moraffah, Yifan Li, Yu Kong, Tianlong Chen, and Huan Liu. 2024. https://arxiv.org/abs/2402.14859 The wolf within: Covert injection of malice into mllm societies via an mllm operative . Preprint, arXiv:2402.14859

work page arXiv 2024
[35]

Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, and 179 others. 2024. https://arxiv.org/abs/2408.00118 Gemma 2: ...

work page internal anchor Pith review arXiv 2024
[36]

Llama Team. 2024. Meta llama guard 2. https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard2/MODEL_CARD.md

2024
[37]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, and 49 others. 2023. https://arxiv.org/abs/2307.09288 Llama 2: Open fo...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[38]

Harold Triedman, Rishi Jha, and Vitaly Shmatikov. 2025. https://arxiv.org/abs/2503.12188 Multi-agent systems execute arbitrary malicious code . Preprint, arXiv:2503.12188

work page arXiv 2025
[39]

Rui Wang, Junda Wu, Yu Xia, Tong Yu, Ruiyi Zhang, Ryan Rossi, Subrata Mitra, Lina Yao, and Julian McAuley. 2025 a . https://arxiv.org/abs/2504.21228 Cacheprune: Neural-based attribution defense against indirect prompt injection attacks . Preprint, arXiv:2504.21228

work page internal anchor Pith review arXiv 2025
[40]

Shenao Wang, Yanjie Zhao, Zhao Liu, Quanchen Zou, and Haoyu Wang. 2025 b . https://arxiv.org/abs/2502.12497 Sok: Understanding vulnerabilities in the large language model supply chain . Preprint, arXiv:2502.12497

work page arXiv 2025
[41]

Shilong Wang, Guibin Zhang, Miao Yu, Guancheng Wan, Fanci Meng, Chongye Guo, Kun Wang, and Yang Wang. 2025 c . https://doi.org/10.18653/v1/2025.acl-long.359 G -safeguard: A topology-guided security lens and treatment on LLM -based multi-agent systems . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Lo...

work page doi:10.18653/v1/2025.acl-long.359 2025
[42]

Yuxin Wen, Arman Zharmagambetov, Ivan Evtimov, Narine Kokhlikyan, Tom Goldstein, Kamalika Chaudhuri, and Chuan Guo. 2025. https://arxiv.org/abs/2510.04885 Rl is a hammer and llms are nails: A simple reinforcement learning recipe for strong prompt injection . Preprint, arXiv:2510.04885

work page arXiv 2025
[43]

Zijie Xu, Minfeng Qi, Shiqing Wu, Lefeng Zhang, Qiwen Wei, Han He, and Ningran Li. 2025. https://arxiv.org/abs/2510.18563 The trust paradox in llm-based multi-agent systems: When collaboration becomes a security vulnerability . Preprint, arXiv:2510.18563

work page arXiv 2025
[44]

Jiaqi Xue, Mengxin Zheng, Ting Hua, Yilin Shen, Yepeng Liu, Ladislau B \"o l \"o ni, and Qian Lou. 2024. Trojllm: A black-box trojan prompt attack on large language models. Advances in Neural Information Processing Systems, 36

2024
[45]

Junyan Yu and Long Wang. 2010. https://doi.org/10.1016/j.sysconle.2010.03.009 Group consensus in multi-agent systems with switching topologies and communication delays . Systems & Control Letters, 59(6):340--348

work page doi:10.1016/j.sysconle.2010.03.009 2010
[46]

Miao Yu, Shilong Wang, Guibin Zhang, Junyuan Mao, Chenlong Yin, Qijiong Liu, Kun Wang, Qingsong Wen, and Yang Wang. 2025. https://doi.org/10.18653/v1/2025.findings-acl.150 N et S afe: Exploring the topological safety of multi-agent system . In Findings of the Association for Computational Linguistics: ACL 2025, pages 2905--2938, Vienna, Austria. Associati...

work page doi:10.18653/v1/2025.findings-acl.150 2025
[47]

Ruiyi Zhang, David Sullivan, Kyle Jackson, Pengtao Xie, and Mei Chen. 2025. https://doi.org/10.18653/v1/2025.naacl-short.21 Defense against prompt injection attacks via mixture of encodings . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2:...

work page doi:10.18653/v1/2025.naacl-short.21 2025
[48]

Mengxin Zheng, Qian Lou, and Lei Jiang. 2022. Trojvit: Trojan insertion in vision transformers. CVPR 2023

2022
[49]

Trojfsl: Trojan insertion in few shot prompt learning

Mengxin Zheng, Jiaqi Xue, Xun Chen, Yanshan Wang, Qian Lou, and Lei Jiang. Trojfsl: Trojan insertion in few shot prompt learning
[50]

Pengyu Zhu, Lijun Li, Yaxing Lyu, Li Sun, Sen Su, and Jing Shao. 2025. https://arxiv.org/abs/2510.11246 Collaborative shadows: Distributed backdoor attacks in llm-based multi-agent systems . Preprint, arXiv:2510.11246

work page arXiv 2025
[51]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
[52]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...