Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems
Pith reviewed 2026-06-28 01:19 UTC · model grok-4.3
The pith
A unified framework organizes latent communication in LLM multi-agent systems along three orthogonal axes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing latent-communication techniques can be placed in a three-axis space defined by what continuous information travels (embeddings, hidden states, KV-caches or other states), which alignment mechanism links sender and receiver (latent-space alignment or layer alignment), and how the received information is fused (concatenation, prepending, arithmetic operations, cross-attention, or cache restoration). The resulting taxonomy places eighteen representative methods into five major design patterns and identifies recurring open challenges.
What carries the argument
The three-axis taxonomy of WHAT information is communicated, WHICH sender-receiver alignment is used, and HOW the information is fused into the receiver.
If this is right
- Eighteen representative methods from 2024-2026 can be placed into the taxonomy.
- Five major design patterns emerge from the classification.
- Open challenges include cross-architecture alignment, security of latent channels, compression for edge devices, and links to latent chain-of-thought.
- The axes supply a shared vocabulary for comparing new proposals.
Where Pith is reading between the lines
- The same axes could be used to decide whether a proposed hybrid token-plus-latent protocol fits existing patterns or introduces a new fusion axis.
- Mapping compression methods for resource-constrained agents onto the WHAT and HOW axes might show whether they form a sixth pattern or remain inside the current five.
- If layer alignment proves more robust than latent-space alignment under architecture changes, future work could test this prediction by swapping model families while holding the other two axes fixed.
Load-bearing premise
The three axes are orthogonal and together cover every relevant method without gaps or the need for extra dimensions.
What would settle it
Discovery of even one latent-communication technique that cannot be assigned to any combination of the three axes or that requires a fourth independent dimension would falsify the claimed sufficiency of the framework.
Figures
read the original abstract
Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language: agents exchange messages token-by-token, verbalising their internal reasoning so that peers can read, verify, and respond. While convenient and interpretable, this protocol suffers from three structural drawbacks -- high inference cost, irreversible information loss during discretization, and ambiguity/redundancy of natural language. A growing body of work therefore explores an alternative protocol -- latent communication -- in which agents exchange continuous representations (embeddings, hidden states, or KV-caches) directly, bypassing the bottleneck of text generation. This paper presents a unified framework for organising the rapidly expanding literature on latent communication. We analyse existing methods along three orthogonal axes: (1) WHAT information is communicated (Embeddings, Hidden States, KV-Caches, or other continuous state); (2) WHICH sender-receiver alignment is used (latent-space alignment and layer alignment); and (3) HOW the communicated information is fused into the receiver (concatenation, prepending, mathematical operations, cross-attention, or cache restoration). Under this 3-axis framework, we systematically categorise eighteen representative methods proposed between 2024 and 2026, identify five major design patterns, and surface a set of open challenges -- including cross-architecture alignment, security of latent channels, compression for edge deployment, and the relationship between latent communication and latent chain-of-thought. We hope that this framework both lowers the barrier to entry for new researchers and provides a vocabulary for comparing future work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce a unified framework for organizing the literature on latent communication in LLM-based multi-agent systems, which exchanges continuous representations (embeddings, hidden states, KV-caches) instead of natural language tokens. It defines three orthogonal axes—WHAT information is communicated (Embeddings, Hidden States, KV-Caches, or other), WHICH sender-receiver alignment is used (latent-space alignment and layer alignment), and HOW the information is fused (concatenation, prepending, mathematical operations, cross-attention, or cache restoration)—and applies this to systematically categorize 18 representative methods from 2024-2026, identify five major design patterns, and surface open challenges such as cross-architecture alignment, security of latent channels, compression, and links to latent chain-of-thought.
Significance. If the framework is robust, it would provide a valuable common vocabulary and organizational structure for the expanding area of latent communication in multi-agent LLMs, lowering the entry barrier for researchers and enabling clearer comparisons across methods. The explicit categorization of 18 methods and extraction of five design patterns constitute a concrete contribution to structuring recent work in this space.
major comments (2)
- [Abstract] Abstract: The central claim that the three axes are orthogonal is load-bearing for the framework's utility as a systematic taxonomy, yet the description leaves open whether WHAT (e.g., KV-Caches) systematically constrains feasible choices in WHICH (layer alignment) and HOW (cache restoration). If such mechanical dependencies exist, the axes are not independent and the categorization of the 18 methods may require re-examination or additional dimensions to avoid artificial groupings.
- [Categorization of methods] The section describing the application to 18 methods: No explicit validation or inter-annotator agreement is mentioned for the categorization into the 3-axis framework or the derivation of the five design patterns. Without such grounding, it is unclear whether the taxonomy is reproducible or whether alternative classifications would yield different patterns.
minor comments (2)
- [Abstract] The abstract states the methods span 2024-2026 but does not describe the literature search protocol or inclusion criteria used to select the 18 representative works; adding this would strengthen reproducibility of the taxonomy.
- Notation for the three axes (WHAT, WHICH, HOW) is introduced without a summary table that cross-tabulates all 18 methods; such a table would improve readability and allow readers to verify the claimed coverage.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the framework's foundations and its application. We address each major point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the three axes are orthogonal is load-bearing for the framework's utility as a systematic taxonomy, yet the description leaves open whether WHAT (e.g., KV-Caches) systematically constrains feasible choices in WHICH (layer alignment) and HOW (cache restoration). If such mechanical dependencies exist, the axes are not independent and the categorization of the 18 methods may require re-examination or additional dimensions to avoid artificial groupings.
Authors: The axes are defined as orthogonal because each captures an independent design decision: WHAT selects the representation type, WHICH specifies alignment requirements between agents, and HOW determines the fusion operation. While certain pairings appear more frequently in practice (e.g., KV-caches with cache restoration), these are empirical tendencies rather than logical necessities; the framework explicitly permits cross-combinations, and the 18-method categorization includes diverse mixes without artificial forcing. We will add a short discussion subsection clarifying this distinction between conceptual orthogonality and observed preferences. revision: partial
-
Referee: [Categorization of methods] The section describing the application to 18 methods: No explicit validation or inter-annotator agreement is mentioned for the categorization into the 3-axis framework or the derivation of the five design patterns. Without such grounding, it is unclear whether the taxonomy is reproducible or whether alternative classifications would yield different patterns.
Authors: Categorization was performed by mapping each method's described mechanisms directly to the three axes using the technical details in the source papers; the five patterns were identified by grouping the resulting assignments. No formal inter-annotator agreement was computed. We agree that greater transparency improves reproducibility, and the revision will include an appendix table listing the axis assignment and supporting excerpt for every method. revision: yes
Circularity Check
No circularity: purely descriptive taxonomy with no derivations or self-referential reductions
full rationale
The paper offers a descriptive 3-axis taxonomy for organizing existing latent-communication methods in the literature. No equations, fitted parameters, predictions, or load-bearing self-citations appear in the provided text; the axes are presented as an organizational lens rather than derived from prior results by the same authors. The claim of orthogonality is an assumption about the framework's utility, not a self-definitional or fitted-input reduction. This is the normal case of a survey paper whose contribution is classification rather than derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The three axes (WHAT, WHICH, HOW) provide an orthogonal and comprehensive categorization of latent communication methods.
Reference graph
Works this paper leans on
-
[1]
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Shaokun Zhang, Sashank Khosla, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation.arXiv preprint arXiv:2308.08155,
-
[2]
Metagpt: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,
19 Latent CommunicationA PREPRINT Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, et al. Metagpt: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,
-
[3]
Camel: Commu- nicative agents for “mind” exploration of llm society.arXiv preprint arXiv:2303.17760,
Guohua Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for “mind” exploration of llm society.arXiv preprint arXiv:2303.17760,
-
[4]
Yingzhuo Liu, Shuodi Liu, Hongsong Tang, Yubing Ma, Zikang Li, Junge Zhang, Liuyu Xiang, and Zhaofeng He. Rainbowarena: A multi-agent toolkit for reinforcement learning and large language models in tabletop games. Knowledge-Based Systems, 333:115046, 2026a. doi:10.1016/j.knosys.2025.115046. Shuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu...
-
[5]
Rong Ye, Xu Zhang, Yizheng Pang, Peng Qi, Zhongwei Wang, et al
URLhttps://arxiv.org/abs/2310.06272. Rong Ye, Xu Zhang, Yizheng Pang, Peng Qi, Zhongwei Wang, et al. Communicating activations between language model agents. InInternational Conference on Machine Learning (ICML),
-
[6]
URL https://arxiv.org/abs/ 2501.14082. Xiao Du et al. Enabling agents to communicate entirely in latent space.arXiv preprint arXiv:2511.09149,
-
[7]
Augmenting multi-agent communication with state delta trajectory.arXiv preprint arXiv:2506.19209,
Runlin Yang, Jucheng Cao, Zhe Zhang, et al. Augmenting multi-agent communication with state delta trajectory.arXiv preprint arXiv:2506.19209,
-
[8]
Thought communication in multiagent collaboration.arXiv preprint arXiv:2510.20733,
Ming Li et al. Thought communication in multiagent collaboration.arXiv preprint arXiv:2510.20733,
-
[9]
Jacob Fein-Ashley, Dhruv Parikh, Rajgopal Kannan, and Viktor Prasanna. Mixture of thoughts: Learning to aggregate what experts think, not just what they say.arXiv preprint arXiv:2509.21164,
-
[10]
Yifan Wang et al. Kvcomm: Enabling efficient llm communication through selective kv sharing.arXiv preprint arXiv:2510.03346, 2025a. Accepted at ICLR
-
[11]
Jiaming Liu et al. Cache-to-cache: Direct semantic communication between large language models.arXiv preprint arXiv:2510.03215, 2025b. Zhenyu Wang et al. Latent collaboration in multi-agent systems.arXiv preprint arXiv:2511.20639, 2025b. Kwangyoun Park et al. Q-kvcomm: Efficient multi-agent communication via adaptive kv cache compression.arXiv preprint ar...
-
[12]
Lragent: Efficient kv cache sharing for multi-lora llm agents.arXiv preprint arXiv:2602.01053,
Hyesung Jeon, Hyeongju Ha, and Jae-Joon Kim. Lragent: Efficient kv cache sharing for multi-lora llm agents.arXiv preprint arXiv:2602.01053,
-
[13]
Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, and Jiang Liu. Relaycaching: Accelerating llm collaboration via decoding kv cache reuse.arXiv preprint arXiv:2603.13289,
-
[14]
Yakov Pyotr Shkolnikov. Agent memory below the prompt: Persistent q4 kv cache for multi-agent llm inference on edge devices.arXiv preprint arXiv:2603.04428,
-
[15]
Haibo Jin, Peng Kuang, Ye Yu, Xiaopeng Yuan, and Haohan Wang. Agent primitives: Reusable latent building blocks for multi-agent systems.arXiv preprint arXiv:2602.03695,
-
[16]
Seunghun Lee, Jihong Park, Ce Zheng, and Hyuncheol Park. Low-latency edge llm handover via joint kv cache transfer and token prefill.arXiv preprint arXiv:2603.28018,
-
[17]
The vision wormhole: Latent-space communication in heterogeneous multi-agent systems
Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu, Hoin Jung, Matt Fredrikson, Xiaoqian Wang, and Jing Gao. The vision wormhole: Latent-space communication in heterogeneous multi-agent systems. arXiv preprint arXiv:2602.15382, 2026b. Guangfu Hao, Yuming Dai, Xianzhe Qin, and Shan Yu. Brain-inspired graph multi-agent systems for llm re...
-
[18]
Jingdi Chen, Hanqing Yang, Zongjun Liu, and Carlee Joe-Wong. The five ws of multi-agent communication: Who talks to whom, when, what, and why — a survey from marl to emergent language and llms.arXiv preprint arXiv:2602.11583,
-
[19]
Reasoning in latent space: An unconstrained chain-of-thought.arXiv preprint arXiv:2412.06769,
Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Reasoning in latent space: An unconstrained chain-of-thought.arXiv preprint arXiv:2412.06769,
-
[20]
20 Latent CommunicationA PREPRINT Tian Wang et al. Seek in the dark: Reasoning via test-time instance-level policy gradient in latent space.arXiv preprint arXiv:2505.13308, 2025c. Awesome Latent Space Contributors. Awesome latent space. https://github.com/YU-deep/ Awesome-Latent-Space,
-
[21]
Emergent multi-agent communication in deep reinforcement learning.arXiv preprint arXiv:1706.02295,
Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in deep reinforcement learning.arXiv preprint arXiv:1706.02295,
-
[22]
METHODQUICK-REFERENCETABLE Table 6: Quick-reference for all 18 methods
A. METHODQUICK-REFERENCETABLE Table 6: Quick-reference for all 18 methods. WHAT / WHICH / HOW refer to the three axes of the unified framework (Section 4). Method WHAT WHICH HOW Train? CIPHER [Liu et al., 2024] Weighted Embedding Last→First Concat✓ AC [Ye et al., 2025] Hidden State (sel. layer, last tok.) Sel.→Sel. Math✓ Interlat [Du et al., 2026] Hidden ...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.