pith. sign in

arxiv: 2606.28958 · v1 · pith:S3VT5IDBnew · submitted 2026-06-27 · 💻 cs.MA

When Latent Agents Lie: KV-Cache Integrity in Multi-Agent LLM Collaboration

Pith reviewed 2026-06-30 08:24 UTC · model grok-4.3

classification 💻 cs.MA
keywords multi-agent LLMsKV-cachelatent collaborationintegrity verificationHMAC manifestsecurityquestion answering
0
0 comments X

The pith

Tampering with hidden KV-cache state can degrade multi-agent LLM answers even when visible commitments look plausible.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In multi-agent question-answering systems, specialist agents share both short visible commitments and their full KV-cache states with a coordinator model. This latent sharing improves exact-match and F1 scores over text-only collaboration on benchmarks such as HiddenBench and HotPotQA. A malicious specialist can alter the hidden KV state to reduce final performance while leaving the visible commitment unchanged and plausible. Text verifiers miss the attack, and simple magnitude checks on the state can be evaded by adaptive tampering. An HMAC-SHA256 manifest that binds specialist, session, model, visible commitment, tensor metadata, and payload digest accepts all honest payloads and rejects all recorded tampered ones.

Core claim

Specialists each see part of the evidence, send a short commitment, and pass full KV-cache state to a coordinator. In clean runs this latent collaboration improves over a matched text-only version, reaching EM/F1 of 0.338/0.486 versus 0.231/0.369 on transformed HiddenBench with Qwen3-4B. When one specialist is malicious, changing the hidden KV state collapses performance even when the visible commitment still looks plausible. A verifier that checks only text misses this failure mode. Simple magnitude checks catch some corruptions but adaptive attacks evade them. The most reliable fix is an HMAC-SHA256 manifest that binds the specialist, session, model, visible commitment, tensor metadata, an

What carries the argument

HMAC-SHA256 manifest that binds specialist identity, session, model, visible commitment, tensor metadata, and payload digest to protect KV-cache during transport.

If this is right

  • Full-KV latent memory can improve multi-agent collaboration but must be treated as a security-sensitive object.
  • Visible text commitments alone cannot verify the integrity of shared hidden state.
  • Adaptive attacks can evade magnitude-based checks on KV tensors while still damaging answers.
  • Cryptographic binding of KV state to visible commitments preserves performance gains from latent sharing.
  • KV-cache exchanged between agents should be protected in transport rather than inspected after receipt.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar integrity mechanisms could apply to other forms of hidden state exchange in distributed LLM systems.
  • The measured performance lift from latent sharing suggests that secure KV protocols may be worth adopting in production agent frameworks.
  • Standardized manifests for model state could reduce the attack surface when multiple models exchange internal representations.

Load-bearing premise

The 295 recorded tampered payloads and the adaptive attacks tested represent realistic threats that could be mounted against deployed multi-agent LLM systems.

What would settle it

An adaptive attack that modifies KV-cache content, changes the coordinator's answer, and still produces a payload accepted by the HMAC manifest.

Figures

Figures reproduced from arXiv: 2606.28958 by Carlos Baquero, Lu\'is Brito.

Figure 1
Figure 1. Figure 1: Role-sequenced protocol and threat model. Each spe [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Exact match on the full 65-record HiddenBench [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
read the original abstract

LLM agents can share more than text. In some systems, an agent can send a short visible message while also passing its full KV-cache state to another model. This hidden state can help the final model combine evidence from several agents, but it is also hard to inspect. A visible message may look harmless even if the hidden state has been changed. We study this problem in a multi-agent question-answering setup. Specialists each see part of the evidence, send a short commitment, and pass full KV-cache state to a coordinator. In clean runs, this latent collaboration improves over a matched text-only version. On transformed HiddenBench with Qwen3-4B, it reaches EM/F1 of 0.338/0.486, compared with 0.231/0.369 for text collaboration. Qwen3-8B and HotPotQA runs show the same direction of improvement. The problem appears when one specialist is malicious. Some false visible commitments can steer answers. More seriously, changing the hidden KV state can collapse performance even when the visible commitment still looks plausible. A verifier that checks only text misses this failure mode. Simple magnitude checks catch some obvious corruptions, but adaptive attacks can evade them while still damaging the final answer. The most reliable fix we find is not to guess whether hidden state looks normal, but to protect it in transport. We implement an HMAC-SHA256 manifest that binds the specialist, session, model, visible commitment, tensor metadata, and payload digest. It accepts all 774 honest replayed payloads and rejects all 295 recorded tampered payloads. The main lesson is that full-KV latent memory can be useful, but it should be treated as a security-sensitive object, not as ordinary internal model state.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper examines security risks in multi-agent LLM systems where agents exchange full KV-cache states alongside short visible commitments in a question-answering setup. It reports that latent KV sharing improves exact match and F1 scores over text-only collaboration (e.g., 0.338/0.486 vs. 0.231/0.369 on HiddenBench with Qwen3-4B). It then demonstrates that tampering the hidden KV state can degrade coordinator performance even when the visible commitment appears plausible, that magnitude-based checks are evadable by adaptive attacks, and that an HMAC-SHA256 manifest binding specialist, session, model, commitment, tensor metadata, and payload digest perfectly separates 774 honest replayed payloads from 295 recorded tampered ones. The central recommendation is to treat KV-cache payloads as security-sensitive objects requiring cryptographic transport protection rather than relying on post-hoc inspection.

Significance. If the empirical separation holds under a representative threat model, the work is significant for highlighting an under-explored attack vector in latent multi-agent collaboration and for supplying a concrete, implementable cryptographic countermeasure. The exact acceptance/rejection counts and performance deltas provide clear, falsifiable evidence; the absence of free parameters or fitted models in the HMAC construction is a strength. The result bears on the design of any system passing internal model state between agents.

major comments (1)
  1. [Abstract, attack evaluation paragraph] Abstract, attack evaluation paragraph: The claim that the HMAC-SHA256 manifest is the most reliable fix rests on its acceptance of all 774 honest payloads and rejection of all 295 recorded tampered payloads. The manuscript provides no evidence that these 295 examples adequately sample sophisticated adaptive KV-cache modifications that preserve the visible commitment while still damaging answers, leaving the superiority over inspection methods dependent on an unverified representativeness assumption.
minor comments (1)
  1. [Abstract] Abstract: The reported EM/F1 improvements are given without error bars, dataset transformation details, or statistical tests, which would make the utility claim more robust even if not central to the security argument.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and valuable feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses
  1. Referee: [Abstract, attack evaluation paragraph] Abstract, attack evaluation paragraph: The claim that the HMAC-SHA256 manifest is the most reliable fix rests on its acceptance of all 774 honest payloads and rejection of all 295 recorded tampered payloads. The manuscript provides no evidence that these 295 examples adequately sample sophisticated adaptive KV-cache modifications that preserve the visible commitment while still damaging answers, leaving the superiority over inspection methods dependent on an unverified representativeness assumption.

    Authors: We appreciate the referee pointing out this limitation in our evaluation. The 295 tampered payloads were produced by the adaptive attacks we implemented, which successfully evaded magnitude-based detection while degrading coordinator performance. We acknowledge that this finite set does not represent all conceivable sophisticated modifications that could preserve the visible commitment. However, the strength of the HMAC-SHA256 manifest lies in its cryptographic properties rather than empirical coverage: it binds the payload digest, so any change to the KV-cache alters the digest and invalidates the HMAC (provided the key remains secret). Therefore, the detection capability does not depend on the representativeness of the 295 examples. We will revise the abstract and the relevant evaluation paragraph to clarify this point, explicitly distinguishing the cryptographic guarantee from the empirical results and acknowledging the scope of the tested attacks. revision: yes

Circularity Check

0 steps flagged

No circularity; results are direct empirical measurements on recorded payloads

full rationale

The paper reports an empirical evaluation: an HMAC-SHA256 manifest is implemented and tested on 774 honest replayed payloads (all accepted) and 295 recorded tampered payloads (all rejected). No derivation chain, equations, fitted parameters, or self-citations are present that reduce the central claim to its own inputs by construction. The representativeness of the 295 tampered examples is a validity question outside the scope of circularity analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard cryptographic assumptions and empirical payload testing; no free parameters, new entities, or ad-hoc axioms are introduced.

axioms (1)
  • standard math HMAC-SHA256 provides integrity when the key remains secret
    Implicit in the claim that the manifest rejects all tampered payloads.

pith-pipeline@v0.9.1-grok · 5858 in / 1253 out tokens · 45186 ms · 2026-06-30T08:24:52.909190+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 27 canonical work pages · 13 internal anchors

  1. [1]

    18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24)

    Zhong, Yinmin et al..DistServe: Disaggregating Pre- fill and Decoding for Goodput-optimized Large Lan- guage Model Serving. 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24). 2024. https://www.usenix.org/conference/ osdi24/presentation/zhong-yinmin

  2. [2]

    23rd USENIX Conference on File and Storage Technologies (FAST 25)

    Qin, Ruoyu et al..Mooncake: Trading More Stor- age for Less Computation—A KVCache-centric Ar- chitecture for Serving LLM Chatbot. 23rd USENIX Conference on File and Storage Technologies (FAST 25). 2025. https://www.usenix.org/conference/ fast25/presentation/qin

  3. [3]

    The Twelfth Interna- tional Conference on Learning Representations

    Pham, Chau et al..Let Models Speak Ciphers: Multia- gent Debate through Embeddings. The Twelfth Interna- tional Conference on Learning Representations. 2024. https://openreview.net/forum?id=sehRvaIPQQ

  4. [4]

    Latent Collaboration in Multi-Agent Systems

    Zou, Jiaru et al..Latent Collaboration in Multi-Agent Systems. Forty-third International Conference on Ma- chine Learning. 2026. arXiv:2511.20639. https:// doi.org/10.48550/arXiv.2511.20639

  5. [5]

    Du, Zhuoyun et al..Enabling Agents to Communicate Entirely in Latent Space. arXiv. 2026. arXiv:2511.09149. https://doi.org/10.48550/arXiv.2511.09149

  6. [6]

    et al..Latent Space Communication via K-V Cache Alignment

    Dery, Lucio M. et al..Latent Space Communication via K-V Cache Alignment. arXiv. 2026. arXiv:2601.06123. https://doi.org/10.48550/arXiv.2601.06123

  7. [7]

    The Fourteenth International Conference on Learning Repre- sentations

    Fu, Tianyu et al..Cache-to-Cache: Direct Semantic Communication Between Large Language Models. The Fourteenth International Conference on Learning Repre- sentations. 2026. https://openreview.net/forum? id=LeatkxrBCi

  8. [8]

    Wang, Chenxi et al..Out of Sight, Not Out of Mind: Unveiling Latent Attack in Latent-based Multi-Agent Systems. arXiv. 2026. arXiv:2605.28214.https://doi. org/10.48550/arXiv.2605.28214

  9. [9]

    Asif, Sadia et al..LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems. arXiv. 2026. arXiv:2605.22786.https://doi.org/10. 48550/arXiv.2605.22786

  10. [10]

    Computer Security

    Lee, Donghyun; Tiwari, Mo; Miranda, Brando.Prompt Infection: LLM-to-LLM Prompt Injection within Multi- agent Systems. Computer Security. ESORICS 2025 In- ternational Workshops. 2026. https://doi.org/10. 1007/978-3-032-16092-8_28

  11. [11]

    Kavathekar, Ishan et al..TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems. arXiv. 2025. arXiv:2511.05269.https://doi.org/10. 48550/arXiv.2511.05269

  12. [12]

    Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs

    Li, Yuxuan; Naito, Aoi; Shirado, Hirokazu. Systematic Failures in Collective Reason- ing under Distributed Information in Multi- Agent LLMs. arXiv. 2026. arXiv:2505.11556. https://doi.org/10.48550/arXiv.2505.11556

  13. [13]

    Proceedings of the 2025 Conference on Empirical Methods in Natu- ral Language Processing

    Tang, Yichen et al..Augmenting Multi-Agent Commu- nication with State Delta Trajectory. Proceedings of the 2025 Conference on Empirical Methods in Natu- ral Language Processing. 2025. https://doi.org/10. 18653/v1/2025.emnlp-main.518

  14. [14]

    Zheng, Yujia et al..Thought Communication in Multi- agent Collaboration. arXiv. 2025. arXiv:2510.20733. https://doi.org/10.48550/arXiv.2510.20733

  15. [15]

    Liu, Xiaoze et al..The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems. arXiv. 2026. arXiv:2602.15382.https://doi.org/10. 48550/arXiv.2602.15382

  16. [16]

    Mou, Xinyi et al..HyLaT: Efficient Multi-Agent Communication via Hybrid Latent-Text Protocol. arXiv. 2026. arXiv:2605.25421.https://doi.org/10. 48550/arXiv.2605.25421

  17. [17]

    Parekh, Swapnil.Thinking Wrong in Si- lence: Backdoor Attacks on Continuous La- tent Reasoning. arXiv. 2026. arXiv:2604.00770. https://doi.org/10.48550/arXiv.2604.00770

  18. [18]

    Wan, Zhipeng et al..Information Leakage from Embedding in Large Language Mod- els. arXiv. 2024. arXiv:2405.11916. https: //doi.org/10.48550/arXiv.2405.11916

  19. [19]

    Liu, Tiantian et al..Mitigating Privacy Risks in LLM Embeddings from Embedding In- version. arXiv. 2024. arXiv:2411.05034. https://doi.org/10.48550/arXiv.2411.05034

  20. [20]

    Nikolaou, Giorgos et al..Language Models are Injective and Hence Invertible. arXiv. 2025. arXiv:2510.15511. https://doi.org/10.48550/arXiv.2510.15511

  21. [21]

    IEEE Access

    El Yagoubi, Faouzi; Badu-Marfo, Godwin; Al Mallah, Ranwa.AgentLeak: A Benchmark for Internal-Channel Privacy Leakage in Multi-Agent LLM Systems. IEEE Access. 2026. https://doi.org/10.1109/ACCESS. 2026.3704541

  22. [22]

    Cui, Yu; Du, Hongyang.MAD-Spear: A Conformity- Driven Prompt Injection Attack on Multi-Agent Debate 15 Systems. arXiv. 2025. arXiv:2507.13038.https://doi. org/10.48550/arXiv.2507.13038

  23. [23]

    Advances in Neural Information Processing Sys- tems 38 (NeurIPS 2025) Datasets and Benchmarks Track

    Cemri, Mert et al..Why Do Multi-Agent LLM Systems Fail?. Advances in Neural Information Processing Sys- tems 38 (NeurIPS 2025) Datasets and Benchmarks Track. 2025. https://openreview.net/forum?id= fAjbYBmonr

  24. [24]

    Zhang, Lingxi; Zheng, Guangtao; Chen, Han- jie.When Embedding-Based Defenses Fail: Rethinking Safety in LLM-Based Multi-Agent Systems. arXiv. 2026. arXiv:2605.01133. https://doi.org/10.48550/arXiv.2605.01133

  25. [26]

    Proceedings of the ACM Web Conference 2026

    Feng, Yang; Pan, Xudong.SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dy- namic Threat Detection. Proceedings of the ACM Web Conference 2026. 2026. https://doi.org/10.1145/ 3774904.3792462

  26. [27]

    Luo, Yaoyang et al..Defending LLM-based Multi-Agent Systems Against Cooperative Attacks with Sentence- Level Rectification. arXiv. 2026. arXiv:2605.28104. https://doi.org/10.48550/arXiv.2605.28104

  27. [28]

    Schroeder de Witt, Christian.Open Challenges in Multi- Agent Security: Towards Secure Systems of Interacting AI Agents. arXiv. 2025. arXiv:2505.02077. https:// doi.org/10.48550/arXiv.2505.02077

  28. [29]

    Advances in Neural Informa- tion Processing Systems 30

    Blanchard, Peva et al..Machine Learning with Adversaries: Byzantine Tolerant Gradi- ent Descent. Advances in Neural Informa- tion Processing Systems 30. 2017. https: //proceedings.neurips.cc/paper/2017/hash/ f4b9ec30ad9f68f89b29639786cb62ef-Abstract. html

  29. [30]

    Proceedings of the 35th International Conference on Machine Learn- ing

    Yin, Dong et al..Byzantine-Robust Distributed Learn- ing: Towards Optimal Statistical Rates. Proceedings of the 35th International Conference on Machine Learn- ing. 2018. https://proceedings.mlr.press/v80/ yin18a.html

  30. [31]

    Robust Aggregation for Federated Learning

    Pillutla, Krishna; Kakade, Sham M.; Harchaoui, Zaid. Robust Aggregation for Federated Learning. IEEE Transactions on Signal Processing. 2022.https://doi. org/10.1109/TSP.2022.3153135

  31. [32]

    Proceedings of the 35th Interna- tional Conference on Machine Learning

    El Mhamdi, El Mahdi; Guerraoui, Rachid; Rouault, Se- bastien.The Hidden Vulnerability of Distributed Learn- ing in Byzantium. Proceedings of the 35th Interna- tional Conference on Machine Learning. 2018. https: //proceedings.mlr.press/v80/mhamdi18a.html

  32. [33]

    A Little Is Enough: Circumventing Defenses for Distributed Learning

    Baruch, Gilad; Baruch, Moran; Goldberg, Yoav. A Little Is Enough: Circumventing Defenses for Distributed Learning. Advances in Neural In- formation Processing Systems 32. 2019. https: //proceedings.neurips.cc/paper/2019/hash/ ec1c59141046cd1866bbbcdfb6ae31d4-Abstract. html

  33. [34]

    Findings of the Association for Computational Linguistics: NAACL

    Zhou, Wei et al..Efficient Multi-Agent Collabora- tion with Tool Use for Online Planning in Com- plex Table Question Answering. Findings of the Association for Computational Linguistics: NAACL

  34. [35]

    Emotion Neurons

    2025. https://doi.org/10.18653/v1/2025. findings-naacl.54

  35. [36]

    Besrour, Ines et al..RAGentA: Multi-Agent Retrieval- Augmented Generation for Attributed Question Answer- ing. arXiv. 2025. arXiv:2506.16988. https://doi. org/10.48550/arXiv.2506.16988

  36. [37]

    Xiao, Xingchen et al..MASS-RAG: Multi- Agent Synthesis Retrieval-Augmented Gen- eration. arXiv. 2026. arXiv:2604.18509. https://doi.org/10.48550/arXiv.2604.18509

  37. [38]

    Addison, Parker et al..C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System. arXiv. 2024. arXiv:2412.13163.https://doi.org/10. 48550/arXiv.2412.13163

  38. [39]

    Gao, Tianhao; Yang, Kai; Li, Yiyang.FD-RAG: Fed- erated Dual-System Retrieval-Augmented Generation. arXiv. 2026. arXiv:2605.27432.https://doi.org/10. 48550/arXiv.2605.27432

  39. [40]

    Mao, Chenxin et al..An Efficient and Privacy- Preserving Architecture for Cross-Institutional Collab- orative RAG. arXiv. 2026. arXiv:2605.25716. https: //doi.org/10.48550/arXiv.2605.25716. 16