The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

Matteo Lupinacci , Francesco Aurelio Pironti , Francesco Blefari , Francesco Romeo , Luigi Arena , Angelo Furfaro

Authors on Pith no claims yet

classification 💻 cs.CR cs.AI

keywords llmsagentsattacksecuritysystemstrustattacksbackdoor

read the original abstract

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond traditional content generation to system-level compromises. This paper presents a comprehensive evaluation of the LLMs security used as reasoning engines within autonomous agents, highlighting how they can be exploited as attack vectors capable of achieving computer takeovers. We focus on how different attack surfaces and trust boundaries can be leveraged to orchestrate such takeovers. We demonstrate that adversaries can effectively coerce popular LLMs into autonomously installing and executing malware on victim machines. Our evaluation of 18 state-of-the-art LLMs reveals that 94.4% of models succumb to Direct Prompt Injection, and 83.3% are vulnerable to the more stealthy and evasive RAG Backdoor Attack. Notably, we tested trust boundaries within multi-agent systems, where LLM agents interact and influence each other, and we revealed that LLMs which successfully resist direct injection or RAG backdoor attacks will execute identical payloads when requested by peer agents. We found that 100.0% of tested LLMs can be compromised through Inter-Agent Trust Exploitation attacks, and that every model exhibits context-dependent security behaviors that create exploitable blind spots.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Trace: Unmasking AI Attack Agents Through Terminal Behavior Fingerprinting
cs.CR 2026-05 unverdicted novelty 7.0

Trace fingerprints AI penetration testing agents from terminal command sequences to identify model families and extracts their system prompts via targeted defensive prompt injection.
When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks
cs.CR 2026-05 unverdicted novelty 6.0

Multi-agent LLM frameworks can spread compromises across agent boundaries via insecure memory inheritance during subagent spawning.
Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines
cs.CR 2026-04 unverdicted novelty 6.0

A single legitimate request can cause LLM orchestrators to output plans that violate security policies through the composition of benign subtasks, bypassing subtask-level checks.