OpenPCC: Open and Confidential LLM Serving on Commodity TEEs

Chao Wang (1); Haoling Zhou (1); Shixuan Zhao (1); Zhiqiang Lin (1) ((1) The Ohio State University)

arxiv: 2606.11145 · v1 · pith:TS6VNE6Xnew · submitted 2026-06-09 · 💻 cs.CR

OpenPCC: Open and Confidential LLM Serving on Commodity TEEs

Haoling Zhou (1) , Shixuan Zhao (1) , Chao Wang (1) , Zhiqiang Lin (1) ((1) The Ohio State University) This is my paper

Pith reviewed 2026-06-27 12:27 UTC · model grok-4.3

classification 💻 cs.CR

keywords confidential computingtrusted execution environmentsLLM inferencecloud securityopen sourceprivacy protectionvLLMTEE

0 comments

The pith

OpenPCC enables confidential LLM cloud serving on commodity TEEs without proprietary hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies the need for open, secure cloud inference services that protect sensitive user data in LLM applications. It proposes OpenPCC, a framework using standard trusted execution environments to achieve confidentiality in an open-source manner. A prototype is built and tested on Llama-3 8B with vLLM, separating the framework's costs from TEE hardware. This approach addresses limitations in existing proprietary solutions like Apple PCC, which cannot be adopted broadly and have design issues. Demonstrating feasibility means others can implement similar private AI services on available hardware.

Core claim

We present OpenPCC, a Confidential CIS framework that does not rely on proprietary hardware but instead uses commercially available TEEs. We implement an open-source prototype and characterize it end-to-end on a Llama-3 8B vLLM workload, separating OpenPCC's own cost from the underlying TEE hardware. Our analysis and evaluation demonstrated the feasibility and security of the system.

What carries the argument

OpenPCC framework for enforcing confidentiality and isolation in LLM inference using commodity TEEs.

Load-bearing premise

Commodity TEEs deliver the required privacy protection for LLM inference without design glitches, and the open-source implementation correctly enforces isolation and confidentiality.

What would settle it

Discovery of a data leak or isolation failure in the OpenPCC prototype running the Llama-3 workload would falsify the security and feasibility claims.

Figures

Figures reproduced from arXiv: 2606.11145 by Chao Wang (1), Haoling Zhou (1), Shixuan Zhao (1), Zhiqiang Lin (1) ((1) The Ohio State University).

**Figure 1.** Figure 1: OpenPcc components and the trust boundary around the inference node. key stays in the CVM, while the public key is committed into a composite attestation token that covers both TEEs, a technique which we discuss later in subsection 4.1. User prompts and completions are decrypted only inside this boundary, addressing P1, and the per-request user data, including KVcache entries, is wiped before the respons… view at source ↗

**Figure 2.** Figure 2: OpenPcc workflow, showing how the parties interact in an inference session. public key is the only credential the node advertises. Llama-3 8B [25] is served by vLLM 0.8.5+ [41] inside the same VM. The inference node does not implement attestation logic itself. Instead, on every attestation request it invokes Intel’s Trust Authority Client [18], which collects the TDX quote through ConfigFS-TSM, collects t… view at source ↗

**Figure 3.** Figure 3: OpenPcc trust pipeline microbenchmark. (a) Per-request operations paid on every inference. (b) Session setup paid once per attestation report cache TTL. Methodology. For the end-to-end Llama-3 8B workload shown in subsection 6.3, every data point is the median over 50 measurement requests after a 10-request warm-up. We report both p50 (the typical case) and p99 (the slow tail case) wherever a tail matters.… view at source ↗

**Figure 4.** Figure 4: p50 TTFT across the 4×4 prompt×batch matrix at 128-token completions. (a) Plain served with gRPC, gateway, and no TEEs. (b) Full OpenPcc stack. A cell in (b) shows both the absolute TTFT and the percentage increase relative to the plain served configuration. Includes the gateway hop, AES-GCM round-trip, GPU prefill, and the first decode step. • Decode throughput (tokens/s): steady-state throughput of the r… view at source ↗

**Figure 5.** Figure 5: Llama-3 8B p50 decode throughput across all three configurations. the gateway. Service-provider/platform-operator collusion is folded into the platform-operator case because it does not give the attacker any capability that the silicon roots do not already mediate. Side channels and denial of service are out of scope per subsection 3.4. 7.1 Malicious Service Provider P1 and P2 mentioned in subsection 3.3 r… view at source ↗

read the original abstract

Generative AI applications such as personal AI agents, image generators, and chat assistants offer advanced capabilities to improve user experience. Behind the scenes, Large Language Models (LLMs) that power these services require a massive amount of computation and are usually deployed in the cloud, available as APIs, meaning that a user's request has to be sent to a Cloud Inference Service (CIS) for processing. However, the strong capabilities of LLM also mean that user's requests now contain much more personal sensitive or enterprise confidential information, demanding equally strong protection in CIS. While early industry efforts such as Apple Private Cloud Compute (PCC) and Google Private AI Compute have emerged to show the potential of secure CIS, they are not adoptable for deployment by others due to their reliance on proprietary hardware and closed ecosystem. In addition, they all suffer from their own design glitches that can undermine the ambitious goal of bringing in true privacy protection to end users. In this paper, we present our analysis of the fundamental requirements of building a secure yet open CIS. We then present OpenPCC, a Confidential CIS framework that does not rely on proprietary hardware but instead uses commercially available TEEs. We implement an open-source prototype and characterize it end-to-end on a Llama-3 8B vLLM workload, separating OpenPCC's own cost from the underlying TEE hardware. Our analysis and evaluation demonstrated the feasibility and security of the system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OpenPCC offers a usable open-source path to confidential LLM serving on standard TEEs, but the security evaluation is too thin to confirm isolation holds for the vLLM workload.

read the letter

The core contribution is a framework and prototype that replaces proprietary hardware stacks with commodity TEEs for confidential cloud inference. They lay out the requirements for a secure CIS, implement an open version, run it end-to-end on Llama-3 8B with vLLM, and separate their own overhead from the TEE baseline. That design choice and the open release are the parts that actually move the needle for people who want to deploy this without depending on one vendor.

The evaluation is the main soft spot. The abstract states that analysis and evaluation demonstrated feasibility and security, yet it gives no numbers on side-channel resistance, attestation coverage for the full inference stack, or checks against page-fault or cache-timing leaks under realistic traffic. The stress-test concern lands here: without those targeted tests, the claim that the prototype correctly confines weights, KV cache, and outputs rests on the assumption that the chosen TEEs plus the integration code do the job, which is not shown in the provided evidence.

This paper is for researchers and engineers working on practical confidential AI serving who need something they can actually build on. A reader looking for a starting implementation and cost breakdown will find value; anyone needing rigorous proof that confidentiality survives real attacks will need the missing data.

I would send it to peer review. The topic is relevant and the open prototype is a concrete step, but the security section will require substantial strengthening before the central claim can be trusted.

Referee Report

3 major / 1 minor

Summary. The paper introduces OpenPCC, a framework for confidential cloud inference services (CIS) for LLMs that relies on commodity TEEs rather than proprietary hardware. It analyzes fundamental requirements for secure open CIS, implements an open-source prototype, and evaluates it end-to-end on a Llama-3 8B vLLM workload while separating OpenPCC overhead from underlying TEE costs, claiming that the analysis and evaluation demonstrate feasibility and security.

Significance. If the security and isolation properties are rigorously established, the work would provide a practical, auditable alternative to closed systems such as Apple Private Cloud Compute, enabling broader deployment of privacy-preserving LLM inference on standard hardware and addressing a key barrier to confidential AI services.

major comments (3)

[Abstract] The abstract asserts that 'analysis and evaluation demonstrated the feasibility and security of the system,' yet the provided description supplies no quantitative security metrics, threat-model coverage, or experimental evidence of isolation (e.g., side-channel resistance or attestation results); this leaves the central security claim unsupported by visible data.
[Evaluation] The evaluation on the Llama-3 8B vLLM workload separates OpenPCC cost from TEE hardware but does not report targeted checks for cache-timing, page-fault, or integration side channels that would confirm confinement of model weights, KV cache, prompts, and outputs inside the TEE boundary under realistic inference traffic.
[Security Analysis] The claim that commodity TEEs avoid the design glitches identified in proprietary systems is asserted without a concrete mapping of those glitches to the chosen TEEs or a verification that the open-source integration correctly enforces the required isolation properties for the full vLLM stack.

minor comments (1)

Notation for TEE-specific overhead components should be defined consistently across text and figures to improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below with clarifications from the manuscript and indicate the revisions we will make.

read point-by-point responses

Referee: [Abstract] The abstract asserts that 'analysis and evaluation demonstrated the feasibility and security of the system,' yet the provided description supplies no quantitative security metrics, threat-model coverage, or experimental evidence of isolation (e.g., side-channel resistance or attestation results); this leaves the central security claim unsupported by visible data.

Authors: We agree the abstract phrasing is too strong. The security argument rests on the requirements analysis (Section 3) showing how commodity TEEs satisfy the isolation and attestation properties needed for confidential CIS, together with the open prototype that places the full vLLM stack inside the TEE. No new quantitative side-channel or attestation experiments were performed; the evaluation measures only performance overhead. We will revise the abstract to read that the analysis and evaluation demonstrate feasibility, with security properties inherited from the chosen TEEs. This change will be made. revision: yes
Referee: [Evaluation] The evaluation on the Llama-3 8B vLLM workload separates OpenPCC cost from TEE hardware but does not report targeted checks for cache-timing, page-fault, or integration side channels that would confirm confinement of model weights, KV cache, prompts, and outputs inside the TEE boundary under realistic inference traffic.

Authors: The evaluation (Section 5) deliberately isolates OpenPCC software overhead from TEE hardware cost on the Llama-3 8B workload; it does not include new side-channel measurements. Confinement is enforced by the TEE boundary and its attestation, which we treat as given under the standard TEE threat model. We will add a short discussion subsection that enumerates the relevant side-channel vectors (cache timing, page faults) and explains why they are outside the OpenPCC threat model once the model and KV cache reside inside the enclave. Full empirical side-channel testing remains outside the scope of this paper, so the revision will be partial. revision: partial
Referee: [Security Analysis] The claim that commodity TEEs avoid the design glitches identified in proprietary systems is asserted without a concrete mapping of those glitches to the chosen TEEs or a verification that the open-source integration correctly enforces the required isolation properties for the full vLLM stack.

Authors: Section 4 presents a requirements-driven comparison that identifies specific glitches in closed systems (e.g., limited auditability, hardware lock-in) and shows how commodity TEEs plus open attestation address them. A more explicit mapping was omitted for brevity. We will insert a concise table that links each cited proprietary glitch to the corresponding commodity-TEE mechanism and will add a paragraph clarifying the vLLM integration points that keep weights, KV cache, and I/O inside the enclave. These additions will be included in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on new design, implementation, and evaluation

full rationale

The paper describes analysis of CIS requirements, followed by presentation of OpenPCC using commodity TEEs, an open-source prototype, and end-to-end characterization on Llama-3 8B vLLM (separating OpenPCC overhead from TEE cost). No equations, fitted parameters, predictions, or self-citations appear as load-bearing elements in the abstract or described structure. The central claims derive from the implemented system and measurements rather than reducing to inputs by construction. This is a standard systems paper with independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no free parameters, axioms, or invented entities are identifiable or required for the high-level claim.

pith-pipeline@v0.9.1-grok · 5803 in / 1136 out tokens · 26115 ms · 2026-06-27T12:27:14.343340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 8 canonical work pages

[1]

AMD Memory Encryption.https://docs.amd.com/api/khub/ documents/ZcsCCmeL80dbtuf_VlGpvw/content

AMD. AMD Memory Encryption.https://docs.amd.com/api/khub/ documents/ZcsCCmeL80dbtuf_VlGpvw/content
[2]

Make your llm fully utilize the context

Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, and Weizhu Chen. Make your llm fully utilize the context. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tom- czak, and C. Zhang, editors,Advances in Neural Information Process- ing Systems, volume 37, pages 62160–62188. Curran Associates, Inc.,
[3]

URL:https://proceedings.neurips.cc/paper_files/paper/2024/ file/71c3451f6cd6a4f82bb822db25cea4fd-Paper-Conference.pdf, doi: 10.52202/079017-1986

work page doi:10.52202/079017-1986 2024
[4]

Privacy-preserving llm infer- ence in practice: A comparative survey of techniques, trade-offs, and deployability, 2026

Davide Andreoletti, Alessandro Rudi, Emanuele Carpanzano, Francesco Lelli, and Tiziano Leidi. Privacy-preserving llm infer- ence in practice: A comparative survey of techniques, trade-offs, and deployability, 2026. URL:https://api.semanticscholar.org/CorpusID: 285144126

2026
[5]

Confidential Inference Systems.https://assets.anthropic

Anthropic. Confidential Inference Systems.https://assets.anthropic. com/m/c52125297b85a42/original/Confidential_Inference_Paper. pdf, 2025

2025
[6]

iCloud Private Relay Overview.https://www.apple.com/ privacy/docs/iCloud_Private_Relay_Overview_Dec2021.PDF, 2021

Apple. iCloud Private Relay Overview.https://www.apple.com/ privacy/docs/iCloud_Private_Relay_Overview_Dec2021.PDF, 2021

2021
[7]

Longbench: A bilingual, multitask benchmark for long context understanding, 2024

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. Longbench: A bilingual, multitask benchmark for long context understanding, 2024. URL:https://arxiv.org/abs/2308. 14508,arXiv:2308.14508

Pith/arXiv arXiv 2024
[8]

CPA Canada. WEBTRUST FOR CERTIFICATION AUTHORITIES PRINCIPLES AND CRITERIA.https://www.cpacanada.ca/business- and-accounting-resources/audit-and-assurance/overview-of- webtrust-services/principles-and-criteria, 2023

2023
[9]

Sprint: Scalable secure & differentially private inference for transformers

Francesco Capano, Jonas Böhler, and Benjamin Weggenmann. Sprint: Scalable secure & differentially private inference for transformers. Proceedings on Privacy Enhancing Technologies, 2026

2026
[10]

Dissecting cpu-gpu unified physical memory on amd mi300a apus,

Marcin Chrapek, Marcin Copik, Etienne Mettaz, and Torsten Hoefler. Confidential llm inference: Performance and cost across cpu and gpu tees, 2025.doi:10.1109/IISWC66894.2025.00017

work page doi:10.1109/iiswc66894.2025.00017 2025
[11]

Open challenges in multi- agent security: Towards secure systems of interacting ai agents.arXiv preprint arXiv:2505.02077, 2025

Christian Schroeder de Witt, Klaudia Krawiecka, Igor Krawczuk, Ben Hagag, William L Anderson, Peter Belcak, Ben Bucknall, Xiaohong Cai, Ayush Chopra, Doron Cohen, et al. Open challenges in multi- agent security: Towards secure systems of interacting ai agents.arXiv preprint arXiv:2505.02077, 2025

Pith/arXiv arXiv 2025
[12]

DeepSeek-V3.https://huggingface.co/deepseek-ai/ DeepSeek-V3

Deepseek. DeepSeek-V3.https://huggingface.co/deepseek-ai/ DeepSeek-V3
[13]

Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

2025
[14]

Google Private AI Compute.https://services.google.com/fh/ files/misc/private_ai_compute_technical_brief.pdf, 2025

Google. Google Private AI Compute.https://services.google.com/fh/ files/misc/private_ai_compute_technical_brief.pdf, 2025

2025
[15]

Le, Hani Jamjoom, Shixuan Zhao, and Zhiqiang Lin

Zhongshu Gu, Enriquillo Valdez, Salman Ahmed, Julian James Stephen, Michael V. Le, Hani Jamjoom, Shixuan Zhao, and Zhiqiang Lin. Blue- print, bootstrap, and bridge: A security look at nvidia gpu confiden- tial computing, 2025. URL:https://api.semanticscholar.org/CorpusID: 280150420

2025
[16]

Security of ai agents

Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, and Hao Chen. Security of ai agents. In2025 IEEE/ACM International Workshop on Responsible AI Engineering (RAIE), pages 45–52. IEEE, 2025

2025
[17]

Shokri, Vitaly Shmatikov, and Emmett Witchel

Tyler Hunt, Congzheng Song, R. Shokri, Vitaly Shmatikov, and Emmett Witchel. Chiron: Privacy-preserving machine learning as a service,
[18]

URL:https://api.semanticscholar.org/CorpusID:3970945
[19]

Intel Trust Domain Execution.https://www.intel.com/content/ www/us/en/products/docs/accelerator-engines/trust-domain- extensions.html

Intel. Intel Trust Domain Execution.https://www.intel.com/content/ www/us/en/products/docs/accelerator-engines/trust-domain- extensions.html
[20]

Intel Trust Authority Client for Python.https://github

Intel. Intel Trust Authority Client for Python.https://github. com/intel/trustauthority-client-for-python, 2025. Includes the trustauthority-pycli CLI for the TDX + NVIDIA H100 composite attestation profile

2025
[21]

GPU remote attestation with Intel Trust Authority.https: //docs.trustauthority.intel.com/main/articles/articles/ita/concept- gpu-attestation.html, 2026

Intel. GPU remote attestation with Intel Trust Authority.https: //docs.trustauthority.intel.com/main/articles/articles/ita/concept- gpu-attestation.html, 2026

2026
[22]

Je Chiao Ku and Shang Liang Chen. The deployment and implemen- tation of cloud platform for remote automatic correction of artifi- cial intelligence models.IEEE Transactions on Industrial Informatics, 21(4):3466–3474, 2025.doi:10.1109/TII.2025.3528563

work page doi:10.1109/tii.2025.3528563 2025
[23]

Gonzalez, Hao Zhang, and Ion Sto- ica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Sto- ica. Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP), 2023

2023
[24]

Cipherleaks: Breaking constant-time cryptography on amd sev via the ciphertext side channel, 2021

Mengyuan Li, Yinqian Zhang, Huibo Wang, Kang Li, and Yueqiang Cheng. Cipherleaks: Breaking constant-time cryptography on amd sev via the ciphertext side channel, 2021. URL:https://api.semanticscholar. org/CorpusID:237522096

2021
[25]

Shadow in the cache: Unveiling and mitigating pri- vacy risks of kv-cache in llm inference.arXiv preprint arXiv:2508.09442, 2025

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, and Zhan Qin. Shadow in the cache: Unveiling and mitigating privacy risks of kv-cache in llm inference. 08 2025. doi:10.48550/arXiv.2508.09442

work page doi:10.48550/arxiv.2508.09442 2025
[26]

Meta-Llama-3-70B.https://huggingface.co/meta-llama/Meta- Llama-3-70B

Meta. Meta-Llama-3-70B.https://huggingface.co/meta-llama/Meta- Llama-3-70B
[27]

Meta-Llama-3-8B.https://huggingface.co/meta-llama/Meta- Llama-3-8B

Meta. Meta-Llama-3-8B.https://huggingface.co/meta-llama/Meta- Llama-3-8B
[28]

Mistral-Large-Instruct-2407.https://huggingface.co/mistralai/ Mistral-Large-Instruct-2407

Mistral. Mistral-Large-Instruct-2407.https://huggingface.co/mistralai/ Mistral-Large-Instruct-2407
[29]

Thor: Secure transformer inference with homomorphic encryption

Jungho Moon, Dongwoo Yoo, Xiaoqian Jiang, and Miran Kim. Thor: Secure transformer inference with homomorphic encryption. InPro- ceedings of the 2025 ACM SIGSAC Conference on Computer and Com- munications Security, CCS ’25, page 3765–3779, New York, NY, USA,

2025
[30]

doi:10.1145/3719027

Association for Computing Machinery. doi:10.1145/3719027. 3765150

work page doi:10.1145/3719027
[31]

Deployment Guide for Confidential Computing.https: //docs.nvidia.com/cc-deployment-guide-tdx.pdf

NVIDIA. Deployment Guide for Confidential Computing.https: //docs.nvidia.com/cc-deployment-guide-tdx.pdf
[32]

Hopper Multi-GPU (PPCIE) Attestation Ex- ample.https://docs.nvidia.com/attestation/quick-start- guide/latest/attestation-examples/hopper_ppcie.html

NVIDIA. Hopper Multi-GPU (PPCIE) Attestation Ex- ample.https://docs.nvidia.com/attestation/quick-start- guide/latest/attestation-examples/hopper_ppcie.html
[33]

NVIDIA Confidential Computing.https://www.nvidia.com/ en-us/data-center/solutions/confidential-computing/

NVIDIA. NVIDIA Confidential Computing.https://www.nvidia.com/ en-us/data-center/solutions/confidential-computing/
[34]

SPDM.https://docs.nvidia.com/networking/display/ nvidianvosusermanualfornvlinkswitchesv25022141/spdm

NVIDIA. SPDM.https://docs.nvidia.com/networking/display/ nvidianvosusermanualfornvlinkswitchesv25022141/spdm
[35]

Attestation using NRAS.https://docs.nvidia.com/attestation/ poc-to-production/latest/integration-options/remote_verifier.html, 2026

NVIDIA. Attestation using NRAS.https://docs.nvidia.com/attestation/ poc-to-production/latest/integration-options/remote_verifier.html, 2026

2026
[36]

AMD SEV-SNP Attestation: Establishing Trust in Guests.https://www.amd.com/content/dam/amd/en/documents/ developer/lss-snp-attestation.pdf, 2022

Jeremy Powell. AMD SEV-SNP Attestation: Establishing Trust in Guests.https://www.amd.com/content/dam/amd/en/documents/ developer/lss-snp-attestation.pdf, 2022

2022
[37]

A multi-llm orchestration engine for personalized, context-rich assistance, 2024

Sumedh Rasal. A multi-llm orchestration engine for personalized, context-rich assistance, 2024. URL:https://arxiv.org/abs/2410.10039, arXiv:2410.10039

arXiv 2024
[38]

PCC Hardware Design.https: //security.apple.com/documentation/private-cloud-compute/ hardwareintegrity#Hardware-design, 2024

Apple Security Research. PCC Hardware Design.https: //security.apple.com/documentation/private-cloud-compute/ hardwareintegrity#Hardware-design, 2024

2024
[39]

Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research.https://security

Apple Security Research. Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research.https://security. apple.com/blog/private-cloud-compute/, 2024

2024
[40]

Machine learning orchestration in cloud environments: Automating the training and deployment of 13 Zhou et al

I Sakthidevi, G Vinoth Rajkumar, R Sunitha, A Sangeetha, R San- thana Krishnan, and S Sundararajan. Machine learning orchestration in cloud environments: Automating the training and deployment of 13 Zhou et al. distributed machine learning ai model. In2023 7th International Con- ference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), pages...

2023
[41]

Trusted yet flexible: High-level runtimes for secure ml inference in tees.Journal of Cybersecurity and Privacy, 6(1), 2026

Nikolaos-Achilleas Steiakakis and Giorgos Vasiliadis. Trusted yet flexible: High-level runtimes for secure ml inference in tees.Journal of Cybersecurity and Privacy, 6(1), 2026. URL:https://www.mdpi.com/ 2624-800X/6/1/23,doi:10.3390/jcp6010023

work page doi:10.3390/jcp6010023 2026
[42]

Martin Thomson and Christopher A. Wood. RFC 9458: Oblivious HTTP.https://www.rfc-editor.org/rfc/rfc9458.html
[43]

Jean-Baptiste Truong, William Gallagher, Tian Guo, and Robert J. Walls. Memory-efficient deep learning inference in trusted execution environments, 2021.doi:10.1109/IC2E52221.2021.00031

work page doi:10.1109/ic2e52221.2021.00031 2021
[44]

vLLM.https://pypi.org/project/vllm/

vllm.ai. vLLM.https://pypi.org/project/vllm/
[45]

I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant LLM serving, 2025

Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant LLM serving, 2025. URL: https://www.ndss-symposium.org/ndss-paper/i-know-what-you- asked-prompt-leakage-via-kv-cache-sharing-in-multi-tenant-llm- serving/

2025
[46]

I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant llm serving.Proceedings 2025 Network and Distributed System Security Symposium, 2025

Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant llm serving.Proceedings 2025 Network and Distributed System Security Symposium, 2025. URL:https: //api.semanticscholar.org/CorpusID:276842968

2025
[47]

Gpu travelling: Efficient confidential collaborative training with tee-enabled gpus, 2025

Shixuan Zhao, Zhongshu Gu, Salman Ahmed, Enriquillo Valdez, Hani Jamjoom, and Zhiqiang Lin. Gpu travelling: Efficient confidential collaborative training with tee-enabled gpus, 2025. doi:10.1145/ 3719027.3765029

arXiv 2025
[48]

Too private to tell: Practical token theft attacks on apple intelligence, 2026

Haoling Zhou, Shixuan Zhao, Chao Wang, and Zhiqiang Lin. Too private to tell: Practical token theft attacks on apple intelligence, 2026. URL:https://arxiv.org/abs/2604.15637,arXiv:2604.15637

Pith/arXiv arXiv 2026
[49]

Confidential Computing on NVIDIA Hopper GPUs: A Per- formance Benchmark Study, September 2024

Jianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, and Shunfan Zhou. Confidential Computing on NVIDIA Hopper GPUs: A Per- formance Benchmark Study, September 2024. arXiv:2409.03992, doi:10.48550/arXiv.2409.03992. 14

work page doi:10.48550/arxiv.2409.03992 2024

[1] [1]

AMD Memory Encryption.https://docs.amd.com/api/khub/ documents/ZcsCCmeL80dbtuf_VlGpvw/content

AMD. AMD Memory Encryption.https://docs.amd.com/api/khub/ documents/ZcsCCmeL80dbtuf_VlGpvw/content

[2] [2]

Make your llm fully utilize the context

Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, and Weizhu Chen. Make your llm fully utilize the context. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tom- czak, and C. Zhang, editors,Advances in Neural Information Process- ing Systems, volume 37, pages 62160–62188. Curran Associates, Inc.,

[3] [3]

URL:https://proceedings.neurips.cc/paper_files/paper/2024/ file/71c3451f6cd6a4f82bb822db25cea4fd-Paper-Conference.pdf, doi: 10.52202/079017-1986

work page doi:10.52202/079017-1986 2024

[4] [4]

Privacy-preserving llm infer- ence in practice: A comparative survey of techniques, trade-offs, and deployability, 2026

Davide Andreoletti, Alessandro Rudi, Emanuele Carpanzano, Francesco Lelli, and Tiziano Leidi. Privacy-preserving llm infer- ence in practice: A comparative survey of techniques, trade-offs, and deployability, 2026. URL:https://api.semanticscholar.org/CorpusID: 285144126

2026

[5] [5]

Confidential Inference Systems.https://assets.anthropic

Anthropic. Confidential Inference Systems.https://assets.anthropic. com/m/c52125297b85a42/original/Confidential_Inference_Paper. pdf, 2025

2025

[6] [6]

iCloud Private Relay Overview.https://www.apple.com/ privacy/docs/iCloud_Private_Relay_Overview_Dec2021.PDF, 2021

Apple. iCloud Private Relay Overview.https://www.apple.com/ privacy/docs/iCloud_Private_Relay_Overview_Dec2021.PDF, 2021

2021

[7] [7]

Longbench: A bilingual, multitask benchmark for long context understanding, 2024

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. Longbench: A bilingual, multitask benchmark for long context understanding, 2024. URL:https://arxiv.org/abs/2308. 14508,arXiv:2308.14508

Pith/arXiv arXiv 2024

[8] [8]

CPA Canada. WEBTRUST FOR CERTIFICATION AUTHORITIES PRINCIPLES AND CRITERIA.https://www.cpacanada.ca/business- and-accounting-resources/audit-and-assurance/overview-of- webtrust-services/principles-and-criteria, 2023

2023

[9] [9]

Sprint: Scalable secure & differentially private inference for transformers

Francesco Capano, Jonas Böhler, and Benjamin Weggenmann. Sprint: Scalable secure & differentially private inference for transformers. Proceedings on Privacy Enhancing Technologies, 2026

2026

[10] [10]

Dissecting cpu-gpu unified physical memory on amd mi300a apus,

Marcin Chrapek, Marcin Copik, Etienne Mettaz, and Torsten Hoefler. Confidential llm inference: Performance and cost across cpu and gpu tees, 2025.doi:10.1109/IISWC66894.2025.00017

work page doi:10.1109/iiswc66894.2025.00017 2025

[11] [11]

Open challenges in multi- agent security: Towards secure systems of interacting ai agents.arXiv preprint arXiv:2505.02077, 2025

Christian Schroeder de Witt, Klaudia Krawiecka, Igor Krawczuk, Ben Hagag, William L Anderson, Peter Belcak, Ben Bucknall, Xiaohong Cai, Ayush Chopra, Doron Cohen, et al. Open challenges in multi- agent security: Towards secure systems of interacting ai agents.arXiv preprint arXiv:2505.02077, 2025

Pith/arXiv arXiv 2025

[12] [12]

DeepSeek-V3.https://huggingface.co/deepseek-ai/ DeepSeek-V3

Deepseek. DeepSeek-V3.https://huggingface.co/deepseek-ai/ DeepSeek-V3

[13] [13]

Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong, Sheng Wen, and Yang Xiang. Ai agents under threat: A survey of key security challenges and future pathways.ACM Computing Surveys, 57(7):1–36, 2025

2025

[14] [14]

Google Private AI Compute.https://services.google.com/fh/ files/misc/private_ai_compute_technical_brief.pdf, 2025

Google. Google Private AI Compute.https://services.google.com/fh/ files/misc/private_ai_compute_technical_brief.pdf, 2025

2025

[15] [15]

Le, Hani Jamjoom, Shixuan Zhao, and Zhiqiang Lin

Zhongshu Gu, Enriquillo Valdez, Salman Ahmed, Julian James Stephen, Michael V. Le, Hani Jamjoom, Shixuan Zhao, and Zhiqiang Lin. Blue- print, bootstrap, and bridge: A security look at nvidia gpu confiden- tial computing, 2025. URL:https://api.semanticscholar.org/CorpusID: 280150420

2025

[16] [16]

Security of ai agents

Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, and Hao Chen. Security of ai agents. In2025 IEEE/ACM International Workshop on Responsible AI Engineering (RAIE), pages 45–52. IEEE, 2025

2025

[17] [17]

Shokri, Vitaly Shmatikov, and Emmett Witchel

Tyler Hunt, Congzheng Song, R. Shokri, Vitaly Shmatikov, and Emmett Witchel. Chiron: Privacy-preserving machine learning as a service,

[18] [18]

URL:https://api.semanticscholar.org/CorpusID:3970945

[19] [19]

Intel Trust Domain Execution.https://www.intel.com/content/ www/us/en/products/docs/accelerator-engines/trust-domain- extensions.html

Intel. Intel Trust Domain Execution.https://www.intel.com/content/ www/us/en/products/docs/accelerator-engines/trust-domain- extensions.html

[20] [20]

Intel Trust Authority Client for Python.https://github

Intel. Intel Trust Authority Client for Python.https://github. com/intel/trustauthority-client-for-python, 2025. Includes the trustauthority-pycli CLI for the TDX + NVIDIA H100 composite attestation profile

2025

[21] [21]

GPU remote attestation with Intel Trust Authority.https: //docs.trustauthority.intel.com/main/articles/articles/ita/concept- gpu-attestation.html, 2026

Intel. GPU remote attestation with Intel Trust Authority.https: //docs.trustauthority.intel.com/main/articles/articles/ita/concept- gpu-attestation.html, 2026

2026

[22] [22]

Je Chiao Ku and Shang Liang Chen. The deployment and implemen- tation of cloud platform for remote automatic correction of artifi- cial intelligence models.IEEE Transactions on Industrial Informatics, 21(4):3466–3474, 2025.doi:10.1109/TII.2025.3528563

work page doi:10.1109/tii.2025.3528563 2025

[23] [23]

Gonzalez, Hao Zhang, and Ion Sto- ica

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Sto- ica. Efficient memory management for large language model serving with pagedattention. InProceedings of the 29th ACM Symposium on Operating Systems Principles (SOSP), 2023

2023

[24] [24]

Cipherleaks: Breaking constant-time cryptography on amd sev via the ciphertext side channel, 2021

Mengyuan Li, Yinqian Zhang, Huibo Wang, Kang Li, and Yueqiang Cheng. Cipherleaks: Breaking constant-time cryptography on amd sev via the ciphertext side channel, 2021. URL:https://api.semanticscholar. org/CorpusID:237522096

2021

[25] [25]

Shadow in the cache: Unveiling and mitigating pri- vacy risks of kv-cache in llm inference.arXiv preprint arXiv:2508.09442, 2025

Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, and Zhan Qin. Shadow in the cache: Unveiling and mitigating privacy risks of kv-cache in llm inference. 08 2025. doi:10.48550/arXiv.2508.09442

work page doi:10.48550/arxiv.2508.09442 2025

[26] [26]

Meta-Llama-3-70B.https://huggingface.co/meta-llama/Meta- Llama-3-70B

Meta. Meta-Llama-3-70B.https://huggingface.co/meta-llama/Meta- Llama-3-70B

[27] [27]

Meta-Llama-3-8B.https://huggingface.co/meta-llama/Meta- Llama-3-8B

Meta. Meta-Llama-3-8B.https://huggingface.co/meta-llama/Meta- Llama-3-8B

[28] [28]

Mistral-Large-Instruct-2407.https://huggingface.co/mistralai/ Mistral-Large-Instruct-2407

Mistral. Mistral-Large-Instruct-2407.https://huggingface.co/mistralai/ Mistral-Large-Instruct-2407

[29] [29]

Thor: Secure transformer inference with homomorphic encryption

Jungho Moon, Dongwoo Yoo, Xiaoqian Jiang, and Miran Kim. Thor: Secure transformer inference with homomorphic encryption. InPro- ceedings of the 2025 ACM SIGSAC Conference on Computer and Com- munications Security, CCS ’25, page 3765–3779, New York, NY, USA,

2025

[30] [30]

doi:10.1145/3719027

Association for Computing Machinery. doi:10.1145/3719027. 3765150

work page doi:10.1145/3719027

[31] [31]

Deployment Guide for Confidential Computing.https: //docs.nvidia.com/cc-deployment-guide-tdx.pdf

NVIDIA. Deployment Guide for Confidential Computing.https: //docs.nvidia.com/cc-deployment-guide-tdx.pdf

[32] [32]

Hopper Multi-GPU (PPCIE) Attestation Ex- ample.https://docs.nvidia.com/attestation/quick-start- guide/latest/attestation-examples/hopper_ppcie.html

NVIDIA. Hopper Multi-GPU (PPCIE) Attestation Ex- ample.https://docs.nvidia.com/attestation/quick-start- guide/latest/attestation-examples/hopper_ppcie.html

[33] [33]

NVIDIA Confidential Computing.https://www.nvidia.com/ en-us/data-center/solutions/confidential-computing/

NVIDIA. NVIDIA Confidential Computing.https://www.nvidia.com/ en-us/data-center/solutions/confidential-computing/

[34] [34]

SPDM.https://docs.nvidia.com/networking/display/ nvidianvosusermanualfornvlinkswitchesv25022141/spdm

NVIDIA. SPDM.https://docs.nvidia.com/networking/display/ nvidianvosusermanualfornvlinkswitchesv25022141/spdm

[35] [35]

Attestation using NRAS.https://docs.nvidia.com/attestation/ poc-to-production/latest/integration-options/remote_verifier.html, 2026

NVIDIA. Attestation using NRAS.https://docs.nvidia.com/attestation/ poc-to-production/latest/integration-options/remote_verifier.html, 2026

2026

[36] [36]

AMD SEV-SNP Attestation: Establishing Trust in Guests.https://www.amd.com/content/dam/amd/en/documents/ developer/lss-snp-attestation.pdf, 2022

Jeremy Powell. AMD SEV-SNP Attestation: Establishing Trust in Guests.https://www.amd.com/content/dam/amd/en/documents/ developer/lss-snp-attestation.pdf, 2022

2022

[37] [37]

A multi-llm orchestration engine for personalized, context-rich assistance, 2024

Sumedh Rasal. A multi-llm orchestration engine for personalized, context-rich assistance, 2024. URL:https://arxiv.org/abs/2410.10039, arXiv:2410.10039

arXiv 2024

[38] [38]

PCC Hardware Design.https: //security.apple.com/documentation/private-cloud-compute/ hardwareintegrity#Hardware-design, 2024

Apple Security Research. PCC Hardware Design.https: //security.apple.com/documentation/private-cloud-compute/ hardwareintegrity#Hardware-design, 2024

2024

[39] [39]

Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research.https://security

Apple Security Research. Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research.https://security. apple.com/blog/private-cloud-compute/, 2024

2024

[40] [40]

Machine learning orchestration in cloud environments: Automating the training and deployment of 13 Zhou et al

I Sakthidevi, G Vinoth Rajkumar, R Sunitha, A Sangeetha, R San- thana Krishnan, and S Sundararajan. Machine learning orchestration in cloud environments: Automating the training and deployment of 13 Zhou et al. distributed machine learning ai model. In2023 7th International Con- ference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), pages...

2023

[41] [41]

Trusted yet flexible: High-level runtimes for secure ml inference in tees.Journal of Cybersecurity and Privacy, 6(1), 2026

Nikolaos-Achilleas Steiakakis and Giorgos Vasiliadis. Trusted yet flexible: High-level runtimes for secure ml inference in tees.Journal of Cybersecurity and Privacy, 6(1), 2026. URL:https://www.mdpi.com/ 2624-800X/6/1/23,doi:10.3390/jcp6010023

work page doi:10.3390/jcp6010023 2026

[42] [42]

Martin Thomson and Christopher A. Wood. RFC 9458: Oblivious HTTP.https://www.rfc-editor.org/rfc/rfc9458.html

[43] [43]

Jean-Baptiste Truong, William Gallagher, Tian Guo, and Robert J. Walls. Memory-efficient deep learning inference in trusted execution environments, 2021.doi:10.1109/IC2E52221.2021.00031

work page doi:10.1109/ic2e52221.2021.00031 2021

[44] [44]

vLLM.https://pypi.org/project/vllm/

vllm.ai. vLLM.https://pypi.org/project/vllm/

[45] [45]

I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant LLM serving, 2025

Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant LLM serving, 2025. URL: https://www.ndss-symposium.org/ndss-paper/i-know-what-you- asked-prompt-leakage-via-kv-cache-sharing-in-multi-tenant-llm- serving/

2025

[46] [46]

I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant llm serving.Proceedings 2025 Network and Distributed System Security Symposium, 2025

Guanlong Wu, Zheng Zhang, Yao Zhang, Weili Wang, Jianyu Niu, Ye Wu, and Yinqian Zhang. I know what you asked: Prompt leakage via kv-cache sharing in multi-tenant llm serving.Proceedings 2025 Network and Distributed System Security Symposium, 2025. URL:https: //api.semanticscholar.org/CorpusID:276842968

2025

[47] [47]

Gpu travelling: Efficient confidential collaborative training with tee-enabled gpus, 2025

Shixuan Zhao, Zhongshu Gu, Salman Ahmed, Enriquillo Valdez, Hani Jamjoom, and Zhiqiang Lin. Gpu travelling: Efficient confidential collaborative training with tee-enabled gpus, 2025. doi:10.1145/ 3719027.3765029

arXiv 2025

[48] [48]

Too private to tell: Practical token theft attacks on apple intelligence, 2026

Haoling Zhou, Shixuan Zhao, Chao Wang, and Zhiqiang Lin. Too private to tell: Practical token theft attacks on apple intelligence, 2026. URL:https://arxiv.org/abs/2604.15637,arXiv:2604.15637

Pith/arXiv arXiv 2026

[49] [49]

Confidential Computing on NVIDIA Hopper GPUs: A Per- formance Benchmark Study, September 2024

Jianwei Zhu, Hang Yin, Peng Deng, Aline Almeida, and Shunfan Zhou. Confidential Computing on NVIDIA Hopper GPUs: A Per- formance Benchmark Study, September 2024. arXiv:2409.03992, doi:10.48550/arXiv.2409.03992. 14

work page doi:10.48550/arxiv.2409.03992 2024