Recognition: unknown
AgenTEE: Confidential LLM Agent Execution on Edge Devices
Pith reviewed 2026-05-10 04:43 UTC · model grok-4.3
The pith
AgenTEE isolates LLM agent components in attested confidential virtual machines to enable secure edge execution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
AgenTEE places the agent runtime, inference engine, and third-party applications into independently attested confidential virtual machines (cVMs) and mediates their interaction through explicit, verifiable communication channels. Built on Arm Confidential Compute Architecture (CCA), AgenTEE enforces strong system-level isolation of sensitive assets and runtime state. Our evaluation shows that such multi-cVMs system is practical, achieving near-native performance with less than 5.15% runtime overhead compared to commodity OS multi-process deployments.
What carries the argument
Multi-cVM architecture with independent attestation and explicit mediation channels on Arm CCA, which isolates the runtime, inference engine, and third-party components to safeguard proprietary assets and state.
If this is right
- LLM agents can run on edge devices while keeping sensitive prompts and weights protected from local software attacks.
- Third-party applications can participate in agent pipelines without gaining access to core runtime state.
- Edge-based automation becomes feasible for tasks that require both low latency and strong privacy guarantees.
- The performance cost of this isolation stays low enough to support real-time agent operation.
Where Pith is reading between the lines
- The same cVM separation pattern could support other non-LLM AI pipelines that combine models with external services on edge hardware.
- Hardware vendors might prioritize broader attestation coverage to reduce reliance on explicit channels alone.
- Developers could test the approach on additional edge platforms to map where Arm CCA support is available.
Load-bearing premise
Arm CCA attestation and the explicit mediation channels between cVMs suffice to block software attacks and malicious device owners, with no unaddressed side channels or attestation bypasses.
What would settle it
A working attack that extracts system prompts or model weights from an AgenTEE deployment on an Arm CCA edge device despite the cVM isolation and attested channels.
Figures
read the original abstract
Large Language Model (LLM) agents provide powerful automation capabilities, but they also create a substantially broader attack surface than traditional applications due to their tight integration with non-deterministic models and third-party services. While current deployments primarily rely on cloud-hosted services, emerging designs increasingly execute agents directly on edge devices to reduce latency and enhance user privacy. However, securely hosting such complex agent pipelines on edge devices remains challenging. These deployments must protect proprietary assets (e.g., system prompts and model weights) and sensitive runtime state on heterogeneous platforms that are vulnerable to software attacks and potentially controlled by malicious users. To address these challenges, we present AgenTEE, a system for deploying confidential agent pipelines on edge devices. AgenTEE places the agent runtime, inference engine, and third-party applications into independently attested confidential virtual machines (cVMs) and mediates their interaction through explicit, verifiable communication channels. Built on Arm Confidential Compute Architecture (CCA), a recent extension to Arm platforms, AgenTEE enforces strong system-level isolation of sensitive assets and runtime state. Our evaluation shows that such multi-cVMs system is practical, achieving near-native performance with less than 5.15% runtime overhead compared to commodity OS multi-process deployments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents AgenTEE, a system for confidential execution of LLM agents on edge devices. It isolates the agent runtime, inference engine, and third-party applications in independently attested confidential virtual machines (cVMs) built on Arm Confidential Compute Architecture (CCA), with interactions mediated through explicit verifiable communication channels. The work claims strong system-level isolation of sensitive assets (prompts, weights, runtime state) against software attacks and malicious device owners, while demonstrating practicality via near-native performance with less than 5.15% runtime overhead relative to commodity OS multi-process deployments.
Significance. If the security guarantees hold, the result is significant for enabling privacy-preserving LLM agent execution on untrusted edge hardware. The engineering focus on multi-cVM isolation with mediated channels and the reported low overhead provide a concrete path toward practical confidential AI on heterogeneous platforms, addressing a growing need as agents move from cloud to edge.
major comments (2)
- [Abstract] Abstract: The central claim of 'strong system-level isolation' protecting against software attacks from malicious device owners rests on Arm CCA attestation and cVM mediation, but the manuscript provides no concrete evidence or analysis ruling out side-channel leakage (e.g., via shared caches or memory controllers during LLM inference) or attestation bypasses; this leaves the confidentiality guarantee only partially supported.
- [Abstract] Abstract/Evaluation: The reported '< 5.15% runtime overhead' is presented as a key practicality result, yet no details are given on the threat model coverage, side-channel analysis, measurement methodology, workloads, or baseline comparison setup, making it impossible to assess whether the number substantiates the security-plus-performance claim.
minor comments (1)
- [Abstract] Abstract: The acronym 'cVMs' is used before its expansion ('confidential virtual machines') is provided, which could be clarified on first use for readability.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review. The comments correctly identify areas where the abstract and security discussion could be strengthened with additional clarification. We address each point below and have revised the manuscript to incorporate explicit discussion of threat model boundaries and expanded evaluation details.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim of 'strong system-level isolation' protecting against software attacks from malicious device owners rests on Arm CCA attestation and cVM mediation, but the manuscript provides no concrete evidence or analysis ruling out side-channel leakage (e.g., via shared caches or memory controllers during LLM inference) or attestation bypasses; this leaves the confidentiality guarantee only partially supported.
Authors: We appreciate the referee highlighting the need for clearer boundaries on our security claims. Section 3 defines the threat model as software attacks originating from a malicious device owner (e.g., via compromised hypervisor or rich OS), which Arm CCA is designed to mitigate through hardware-enforced cVM isolation and attestation. The attestation mechanism relies on the hardware root of trust to verify cVM images and configurations, preventing bypasses at the software level. However, we agree that the manuscript does not explicitly analyze hardware side-channels such as cache or memory-controller leakage during LLM inference. We have revised the Security Analysis section to add a dedicated paragraph stating that our guarantees are scoped to software attacks, that side-channel resistance is not claimed without additional mitigations (e.g., constant-time code or cache partitioning), and that such attacks remain an open consideration for future work, with references to related literature on confidential computing side-channels. This revision clarifies the scope without overstating the guarantees. revision: yes
-
Referee: [Abstract] Abstract/Evaluation: The reported '< 5.15% runtime overhead' is presented as a key practicality result, yet no details are given on the threat model coverage, side-channel analysis, measurement methodology, workloads, or baseline comparison setup, making it impossible to assess whether the number substantiates the security-plus-performance claim.
Authors: We acknowledge that the abstract is too brief and does not direct readers to the supporting details. The full evaluation (Section 5) specifies the workloads (representative LLM agent pipelines involving tool invocation and multi-turn reasoning with models such as Llama-7B), the baseline (identical agent deployment using standard Linux multi-process isolation on the same Arm edge hardware), and the measurement approach (cycle-accurate timers with repeated runs to report average overhead). Threat model coverage is described in Section 3. Side-channel considerations are now addressed in the revised Security Analysis section as noted above. We have updated the abstract to include a concise reference to the evaluation setup and added a summary table in Section 5 that explicitly lists methodology, workloads, and baseline configuration. These changes make the reported overhead verifiable while preserving the original experimental results. revision: yes
Circularity Check
No circularity: engineering system relies on external hardware and empirical benchmarks
full rationale
The paper presents a system architecture for confidential LLM agent execution on edge devices using Arm CCA cVMs and explicit mediation channels. No equations, fitted parameters, or self-referential definitions appear in the provided text. Security and performance claims rest on the external properties of Arm CCA attestation plus direct timing comparisons to commodity OS multi-process setups, without any reduction of the central result to its own inputs by construction. No self-citations are load-bearing for the isolation guarantees, and the work contains no ansatzes, uniqueness theorems, or renamings of prior results that would trigger the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Arm CCA provides hardware-enforced isolation and remote attestation for confidential VMs that is sufficient against software attacks on edge devices.
invented entities (1)
-
Explicit verifiable communication channels between cVMs
no independent evidence
Reference graph
Works this paper leans on
-
[1]
kvmtool-cca.https://gitlab.arm.com/linux-arm/kvmtool-cca/- /tree/cca/v3?ref_type=headsAccessed Feb 2025
2025. kvmtool-cca.https://gitlab.arm.com/linux-arm/kvmtool-cca/- /tree/cca/v3?ref_type=headsAccessed Feb 2025
2025
- [2]
-
[3]
Sina Abdollahi, Amir Al Sadi, Marios Kogias, David Kotz, and Hamed Haddadi. 2025. Confidential, Attestable, and Efficient Inter-CVM Com- munication with Arm CCA.arXiv preprint arXiv:2512.01594(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [4]
- [5]
-
[6]
Android. 2025. AVF architecture.https://source.android.com/docs/ core/virtualization/architecture#memory-ownershipAccessed July 2025
2025
-
[7]
Anthropic. 2025. Effective context engineering for AI agents.https: //www.anthropic.com/engineering/effective-context-engineering- for-ai-agentsAccessed Feb 2026
2025
-
[8]
Apple Inc. 2025. Secure Enclave.https://support.apple.com/en- gb/guide/security/sec59b0b31ff/webAccessed Feb 2025
2025
-
[9]
Arm Limited. 2025. Arm Confidential Compute Architecture.https: //www.arm.com/architecture/security-features/arm-confidential- compute-architectureAccessed Feb 2025
2025
-
[10]
Arm Limited. 2025. Learn the architecture - TrustZone for AArch64. https://developer.arm.com/documentation/102418/latest/Accessed Feb 2025
2025
-
[11]
Arm Limited. 2025. linux-cca.https://gitlab.arm.com/linux-arm/linux- cca/-/commit/fad35572dbAccessed Feb 2025
2025
- [12]
-
[13]
Ferdinand Brasser, David Gens, Patrick Jauernig, Ahmad-Reza Sadeghi, and Emmanuel Stapf. 2019. SANCTUARY: ARMing TrustZone with User-space Enclaves.. InNDSS
2019
-
[14]
Wei Chen, Zhiyuan Li, Zhen Guo, and Yikang Shen. 2025. Octo-planner: On-device language model for planner-action agents. InInternational Workshop on Engineering Multi-Agent Systems. Springer, 141–156
2025
-
[15]
Edoardo Debenedetti, Ilia Shumailov, Tianqi Fan, Jamie Hayes, Nicholas Carlini, Daniel Fabian, Christoph Kern, Chongyang Shi, An- dreas Terzis, and Florian Tramèr. 2025. Defeating prompt injections by design.arXiv preprint arXiv:2503.18813(2025)
work page internal anchor Pith review arXiv 2025
-
[16]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer
-
[17]
int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems35 (2022), 30318–30332
Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems35 (2022), 30318–30332
2022
-
[18]
Embrace The Red. 2025. Claude Code: Data Exfiltration with DNS (CVE-2025-55284).https://embracethered.com/blog/posts/2025/clau de-code-exfiltration-via-dns-requests/Accessed Feb 2026
2025
-
[19]
Embrace The Red. 2025. How Devin AI Can Leak Your Secrets via Multiple Means.https://embracethered.com/blog/posts/2025/devin- can-leak-your-secrets/Accessed Feb 2026
2025
-
[20]
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. 2016. CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy. InInternational Conference on Machine Learning. PMLR, 201–210
2016
-
[21]
Google Cloud. 2026. Use system instructions (Generative AI on Vertex AI).https://docs.cloud.google.com/vertex-ai/generative-ai/docs/lear n/prompts/system-instructionsAccessed Feb 2026
2026
-
[22]
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. 2023. Not what you’ve signed up for: Compromising real-world llm-integrated applications with indirect prompt injection. InProceedings of the 16th ACM workshop on artificial intelligence and security. 79–90
2023
-
[23]
Hugging Face. [n. d.]. bartowski/Llama-3.2-1B-Instruct-GGUF.https: //huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUFAccessed Feb 2025
2025
-
[24]
Hugging Face. 2019. openai-community/gpt2-medium.https://hugg ingface.co/openai-community/gpt2-mediumAccessed Feb 2025
2019
-
[25]
Bo Hui, Haolin Yuan, Neil Gong, Philippe Burlina, and Yinzhi Cao
-
[26]
InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security
Pleak: Prompt leaking attacks against large language model applications. InProceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security. 3600–3614
2024
-
[27]
Intel®. 2025. Intel Software Guard Extensions. Retrieved June 7, 2025 fromhttps://www.intel.com/content/www/us/en/developer/tools/so ftware-guard-extensions/overview.html
2025
- [28]
-
[29]
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica
-
[30]
InProceedings of the 29th symposium on operating systems principles
Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th symposium on operating systems principles. 611–626. EuroMLSys ’26, April 27–30, 2026, Edinburgh, Scotland Uk Abdollahi et al
2026
-
[31]
LangChain. 2026. Langchain: Build context-aware, reasoning ap- plications with langchain’s flexible abstractions and ai-first toolkit. https://www.langchain.com/Accessed Feb 2026
2026
-
[32]
Linaro. 2024. MAD24-410 Arm Confidential Compute Architecture open-source enablement update. Retrieved March 9, 2025 fromhttps: //resources.linaro.org/en/resource/rEjhEezEvnNMC3LALzUTrr
2024
-
[33]
Linux. 2025. seccomp(2) — Linux manual page.https://man7.org/lin ux/manpages/man2/seccomp.2.htmlAccessed Feb 2026
2025
- [34]
-
[35]
Mohammad M Maheri, Sunil Cotterill, Alex Davidson, and Hamed Haddadi. 2025. ZK-APEX: Zero-Knowledge Approximate Personalized Unlearning with Executable Proofs.arXiv preprint arXiv:2512.09953 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [36]
-
[37]
Microsoft 365. [n. d.]. Microsoft 365 Copilot hub.https://learn.micros oft.com/en-us/copilot/microsoft-365/Accessed Feb 2026
2026
-
[38]
Fan Mo, Hamed Haddadi, Kleomenis Katevas, Eduard Marin, Diego Perino, and Nicolas Kourtellis. 2021. PPFL: privacy-preserving feder- ated learning with trusted execution environments. InProceedings of the 19th annual international conference on mobile systems, applications, and services. 94–108
2021
-
[39]
OpenAI Developers. [n. d.]. Using tools.https://developers.openai.co m/api/docs/guides/toolsAccessed Feb 2026
2026
- [40]
-
[41]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al. 2023. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[42]
Radxa. [n. d.]. ROCK 5B. Retrieved April 14, 2025 fromhttps://radxa. com/products/rock5/5b/
2025
-
[43]
M Sadegh Riazi, Christian Weinert, Oleksandr Tkachenko, Ebrahim M Songhori, Thomas Schneider, and Farinaz Koushanfar. 2018. Chameleon: A hybrid secure computation framework for machine learning applications. InProceedings of the 2018 on Asia Conference on Computer and Communications Security. 707–721
2018
-
[44]
Fan Sang, Jaehyuk Lee, Xiaokuan Zhang, and Taesoo Kim. 2025. POR- TAL: Fast and Secure Device Access with Arm CCA for Modern Arm Mobile System-on-Chips (SoCs). In2025 IEEE Symposium on Security and Privacy (SP). IEEE, 4099–4116
2025
-
[45]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in neural information processing systems36 (2023), 68539–68551
2023
-
[46]
Tianxiang Shen, Ji Qi, Jianyu Jiang, Xian Wang, Siyuan Wen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Xiapu Luo, et al . 2022. SOTER: Guarding Black-box Inference for General Neural Networks at the Edge. In2022 USENIX Annual Technical Conference (USENIX ATC 22). 723–738
2022
-
[47]
Sandra Siby, Sina Abdollahi, Mohammad Maheri, Marios Kogias, and Hamed Haddadi. 2024. GuaranTEE: Towards Attestable and Private ML with CCA. InProceedings of the 4th Workshop on Machine Learning and Systems. 1–9
2024
-
[48]
Supraja Sridhara, Andrin Bertschi, Benedict Schlüter, Mark Kuhne, Fabio Aliberti, and Shweta Shinde. 2024. ACAI: Extending Arm Confi- dential Computing Architecture Protection from CPUs to Accelerators. In33rd USENIX Security Symposium (USENIX Security’24)
2024
-
[49]
Lizhi Sun, Shuocheng Wang, Hao Wu, Yuhang Gong, Fengyuan Xu, Yunxin Liu, Hao Han, and Sheng Zhong. 2022. LEAP: TrustZone Based Developer-Friendly TEE for Intelligent Mobile Apps.IEEE Transactions on Mobile Computing(2022)
2022
-
[50]
Zhichuang Sun, Ruimin Sun, Changming Liu, Amrita Roy Chowdhury, Long Lu, and Somesh Jha. 2023. ShadowNet: A secure and efficient on-device model inference system for convolutional neural networks. In2023 IEEE Symposium on Security and Privacy (SP). IEEE, 1596–1612
2023
-
[51]
trusted firmware. 2025. TF-A.https://www.trustedfirmware.org/proj ects/tf-aAccessed Feb 2025
2025
-
[52]
TrustedFirmware. 2025. TF-RMM.https://www.trustedfirmware.org/ projects/tf-rmmAccessed Feb 2025
2025
- [53]
-
[54]
Chenxu Wang, Fengwei Zhang, Yunjie Deng, Kevin Leach, Jiannong Cao, Zhenyu Ning, Shoumeng Yan, and Zhengyu He. 2024. CAGE: Complementing Arm CCA with GPU Extensions. InNetwork and Distributed System Security (NDSS) Symposium
2024
- [55]
-
[56]
Wikipedia. [n. d.]. Widevine.https://en.wikipedia.org/wiki/Widevine Accessed Feb 2026
2026
-
[57]
Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. 2025. I Know What You Asked: Prompt Leakage via KV-Cache Sharing in Multi-Tenant LLM Serving.. InNDSS
2025
- [58]
-
[59]
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. 2022. React: Synergizing reasoning and acting in language models. InThe eleventh international conference on learning representations
2022
-
[60]
Mengxia Yu, De Wang, Qi Shan, Colorado J Reed, and Alvin Wan
- [61]
- [62]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.