Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

Yingzhuo Liu

arxiv: 2606.05711 · v2 · pith:JY3KBTJLnew · submitted 2026-06-04 · 💻 cs.CL

Beyond tokens: a unified framework for latent communication in LLM-based multi-agent systems

Yingzhuo Liu This is my paper

Pith reviewed 2026-06-28 01:19 UTC · model grok-4.3

classification 💻 cs.CL

keywords latent communicationmulti-agent systemslarge language modelscontinuous representationsinformation fusionalignmentKV-cachesembeddings

0 comments

The pith

A unified framework organizes latent communication in LLM multi-agent systems along three orthogonal axes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a taxonomy for latent communication, where agents exchange continuous internal states rather than generating text tokens. It classifies methods by the type of state shared, the alignment between sender and receiver, and the way the state is integrated at the receiving agent. This structure is offered because token exchange carries high inference cost, irreversible discretization loss, and linguistic ambiguity, while a shared vocabulary could help compare and extend the alternatives. The framework is applied to eighteen methods published between 2024 and 2026, yielding five recurring design patterns and a list of open problems.

Core claim

Existing latent-communication techniques can be placed in a three-axis space defined by what continuous information travels (embeddings, hidden states, KV-caches or other states), which alignment mechanism links sender and receiver (latent-space alignment or layer alignment), and how the received information is fused (concatenation, prepending, arithmetic operations, cross-attention, or cache restoration). The resulting taxonomy places eighteen representative methods into five major design patterns and identifies recurring open challenges.

What carries the argument

The three-axis taxonomy of WHAT information is communicated, WHICH sender-receiver alignment is used, and HOW the information is fused into the receiver.

If this is right

Eighteen representative methods from 2024-2026 can be placed into the taxonomy.
Five major design patterns emerge from the classification.
Open challenges include cross-architecture alignment, security of latent channels, compression for edge devices, and links to latent chain-of-thought.
The axes supply a shared vocabulary for comparing new proposals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same axes could be used to decide whether a proposed hybrid token-plus-latent protocol fits existing patterns or introduces a new fusion axis.
Mapping compression methods for resource-constrained agents onto the WHAT and HOW axes might show whether they form a sixth pattern or remain inside the current five.
If layer alignment proves more robust than latent-space alignment under architecture changes, future work could test this prediction by swapping model families while holding the other two axes fixed.

Load-bearing premise

The three axes are orthogonal and together cover every relevant method without gaps or the need for extra dimensions.

What would settle it

Discovery of even one latent-communication technique that cannot be assigned to any combination of the three axes or that requires a fourth independent dimension would falsify the claimed sufficiency of the framework.

Figures

Figures reproduced from arXiv: 2606.05711 by Yingzhuo Liu.

**Figure 2.** Figure 2: Why latent communication wins on information density. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: The unified 3-axis framework. The three axes — [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Method categorisation tree. Each leaf corresponds to one of the 18 methods surveyed in this paper. Green [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Mind map of six open problems in latent communication. Each branch represents a research direction with [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Venn diagram showing the overlap and distinction between Latent Communication (multi-agent) and Latent [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

read the original abstract

Multi-agent systems built on large language models (LLMs) have become a prevailing paradigm for tackling complex reasoning, planning, and tool-use tasks. The dominant communication protocol in such systems is natural language: agents exchange messages token-by-token, verbalising their internal reasoning so that peers can read, verify, and respond. While convenient and interpretable, this protocol suffers from three structural drawbacks -- high inference cost, irreversible information loss during discretization, and ambiguity/redundancy of natural language. A growing body of work therefore explores an alternative protocol -- latent communication -- in which agents exchange continuous representations (embeddings, hidden states, or KV-caches) directly, bypassing the bottleneck of text generation. This paper presents a unified framework for organising the rapidly expanding literature on latent communication. We analyse existing methods along three orthogonal axes: (1) WHAT information is communicated (Embeddings, Hidden States, KV-Caches, or other continuous state); (2) WHICH sender-receiver alignment is used (latent-space alignment and layer alignment); and (3) HOW the communicated information is fused into the receiver (concatenation, prepending, mathematical operations, cross-attention, or cache restoration). Under this 3-axis framework, we systematically categorise eighteen representative methods proposed between 2024 and 2026, identify five major design patterns, and surface a set of open challenges -- including cross-architecture alignment, security of latent channels, compression for edge deployment, and the relationship between latent communication and latent chain-of-thought. We hope that this framework both lowers the barrier to entry for new researchers and provides a vocabulary for comparing future work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a survey paper offering a 3-axis taxonomy for latent communication in multi-agent LLMs, but the axes are not orthogonal.

read the letter

The main thing to know is that this paper proposes a three-axis framework to organize latent communication methods in LLM-based multi-agent systems and applies it to eighteen recent papers, pulling out five design patterns.

It does a solid job of summarizing why token-based communication has limits like high cost and information loss, then maps out the shift to sharing embeddings, hidden states, or KV-caches. The synthesis into axes for what gets sent, how sender and receiver align, and how the receiver fuses the info is new as a single structure, and listing open problems around security, cross-architecture work, and compression gives the field some useful language.

The soft spot is the orthogonality claim. The type of information shared directly limits the alignment and fusion choices, so the dimensions are dependent rather than independent. KV-cache methods, for example, are mechanically tied to layer alignment and cache restoration, while embeddings allow more options. This creates a risk of forcing categories instead of reflecting real constraints.

There are no experiments, proofs, or new data, which is expected for a taxonomy but means the value rests entirely on whether the groupings feel natural once you look at the details.

This is for researchers already working on or entering multi-agent LLM systems who need a map of the latent communication literature. It deserves peer review because the organization is a reasonable starting point even if the axes need adjustment.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a unified framework for organizing the literature on latent communication in LLM-based multi-agent systems, which exchanges continuous representations (embeddings, hidden states, KV-caches) instead of natural language tokens. It defines three orthogonal axes—WHAT information is communicated (Embeddings, Hidden States, KV-Caches, or other), WHICH sender-receiver alignment is used (latent-space alignment and layer alignment), and HOW the information is fused (concatenation, prepending, mathematical operations, cross-attention, or cache restoration)—and applies this to systematically categorize 18 representative methods from 2024-2026, identify five major design patterns, and surface open challenges such as cross-architecture alignment, security of latent channels, compression, and links to latent chain-of-thought.

Significance. If the framework is robust, it would provide a valuable common vocabulary and organizational structure for the expanding area of latent communication in multi-agent LLMs, lowering the entry barrier for researchers and enabling clearer comparisons across methods. The explicit categorization of 18 methods and extraction of five design patterns constitute a concrete contribution to structuring recent work in this space.

major comments (2)

[Abstract] Abstract: The central claim that the three axes are orthogonal is load-bearing for the framework's utility as a systematic taxonomy, yet the description leaves open whether WHAT (e.g., KV-Caches) systematically constrains feasible choices in WHICH (layer alignment) and HOW (cache restoration). If such mechanical dependencies exist, the axes are not independent and the categorization of the 18 methods may require re-examination or additional dimensions to avoid artificial groupings.
[Categorization of methods] The section describing the application to 18 methods: No explicit validation or inter-annotator agreement is mentioned for the categorization into the 3-axis framework or the derivation of the five design patterns. Without such grounding, it is unclear whether the taxonomy is reproducible or whether alternative classifications would yield different patterns.

minor comments (2)

[Abstract] The abstract states the methods span 2024-2026 but does not describe the literature search protocol or inclusion criteria used to select the 18 representative works; adding this would strengthen reproducibility of the taxonomy.
Notation for the three axes (WHAT, WHICH, HOW) is introduced without a summary table that cross-tabulates all 18 methods; such a table would improve readability and allow readers to verify the claimed coverage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the framework's foundations and its application. We address each major point below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the three axes are orthogonal is load-bearing for the framework's utility as a systematic taxonomy, yet the description leaves open whether WHAT (e.g., KV-Caches) systematically constrains feasible choices in WHICH (layer alignment) and HOW (cache restoration). If such mechanical dependencies exist, the axes are not independent and the categorization of the 18 methods may require re-examination or additional dimensions to avoid artificial groupings.

Authors: The axes are defined as orthogonal because each captures an independent design decision: WHAT selects the representation type, WHICH specifies alignment requirements between agents, and HOW determines the fusion operation. While certain pairings appear more frequently in practice (e.g., KV-caches with cache restoration), these are empirical tendencies rather than logical necessities; the framework explicitly permits cross-combinations, and the 18-method categorization includes diverse mixes without artificial forcing. We will add a short discussion subsection clarifying this distinction between conceptual orthogonality and observed preferences. revision: partial
Referee: [Categorization of methods] The section describing the application to 18 methods: No explicit validation or inter-annotator agreement is mentioned for the categorization into the 3-axis framework or the derivation of the five design patterns. Without such grounding, it is unclear whether the taxonomy is reproducible or whether alternative classifications would yield different patterns.

Authors: Categorization was performed by mapping each method's described mechanisms directly to the three axes using the technical details in the source papers; the five patterns were identified by grouping the resulting assignments. No formal inter-annotator agreement was computed. We agree that greater transparency improves reproducibility, and the revision will include an appendix table listing the axis assignment and supporting excerpt for every method. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive taxonomy with no derivations or self-referential reductions

full rationale

The paper offers a descriptive 3-axis taxonomy for organizing existing latent-communication methods in the literature. No equations, fitted parameters, predictions, or load-bearing self-citations appear in the provided text; the axes are presented as an organizational lens rather than derived from prior results by the same authors. The claim of orthogonality is an assumption about the framework's utility, not a self-definitional or fitted-input reduction. This is the normal case of a survey paper whose contribution is classification rather than derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

As a survey and taxonomy paper, the main addition is the organizational structure rather than new parameters or entities.

axioms (1)

domain assumption The three axes (WHAT, WHICH, HOW) provide an orthogonal and comprehensive categorization of latent communication methods.
This underpins the entire framework described in the abstract.

pith-pipeline@v0.9.1-grok · 5824 in / 1308 out tokens · 61979 ms · 2026-06-28T01:19:46.746340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 1 canonical work pages

[1]

Autogen: Enabling next-gen llm applications via multi-agent conversation.arXiv preprint arXiv:2308.08155,

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Shaokun Zhang, Sashank Khosla, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation.arXiv preprint arXiv:2308.08155,

Pith/arXiv arXiv
[2]

Metagpt: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,

19 Latent CommunicationA PREPRINT Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, et al. Metagpt: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,

Pith/arXiv arXiv
[3]

Camel: Commu- nicative agents for “mind” exploration of llm society.arXiv preprint arXiv:2303.17760,

Guohua Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for “mind” exploration of llm society.arXiv preprint arXiv:2303.17760,

Pith/arXiv arXiv
[4]

Rainbowarena: A multi-agent toolkit for reinforcement learning and large language models in tabletop games

Yingzhuo Liu, Shuodi Liu, Hongsong Tang, Yubing Ma, Zikang Li, Junge Zhang, Liuyu Xiang, and Zhaofeng He. Rainbowarena: A multi-agent toolkit for reinforcement learning and large language models in tabletop games. Knowledge-Based Systems, 333:115046, 2026a. doi:10.1016/j.knosys.2025.115046. Shuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu...

work page doi:10.1016/j.knosys.2025.115046 2025
[5]

Rong Ye, Xu Zhang, Yizheng Pang, Peng Qi, Zhongwei Wang, et al

URLhttps://arxiv.org/abs/2310.06272. Rong Ye, Xu Zhang, Yizheng Pang, Peng Qi, Zhongwei Wang, et al. Communicating activations between language model agents. InInternational Conference on Machine Learning (ICML),

arXiv
[6]

Xiao Du et al

URL https://arxiv.org/abs/ 2501.14082. Xiao Du et al. Enabling agents to communicate entirely in latent space.arXiv preprint arXiv:2511.09149,

arXiv
[7]

Augmenting multi-agent communication with state delta trajectory.arXiv preprint arXiv:2506.19209,

Runlin Yang, Jucheng Cao, Zhe Zhang, et al. Augmenting multi-agent communication with state delta trajectory.arXiv preprint arXiv:2506.19209,

arXiv
[8]

Thought communication in multiagent collaboration.arXiv preprint arXiv:2510.20733,

Ming Li et al. Thought communication in multiagent collaboration.arXiv preprint arXiv:2510.20733,

arXiv
[9]

Mixture of thoughts: Learning to aggregate what experts think, not just what they say.arXiv preprint arXiv:2509.21164,

Jacob Fein-Ashley, Dhruv Parikh, Rajgopal Kannan, and Viktor Prasanna. Mixture of thoughts: Learning to aggregate what experts think, not just what they say.arXiv preprint arXiv:2509.21164,

arXiv
[10]

Kvcomm: Enabling efficient llm communication through selective kv sharing.arXiv preprint arXiv:2510.03346, 2025a

Yifan Wang et al. Kvcomm: Enabling efficient llm communication through selective kv sharing.arXiv preprint arXiv:2510.03346, 2025a. Accepted at ICLR

arXiv
[11]

Cache-to-cache: Direct semantic communication between large language models.arXiv preprint arXiv:2510.03215, 2025b

Jiaming Liu et al. Cache-to-cache: Direct semantic communication between large language models.arXiv preprint arXiv:2510.03215, 2025b. Zhenyu Wang et al. Latent collaboration in multi-agent systems.arXiv preprint arXiv:2511.20639, 2025b. Kwangyoun Park et al. Q-kvcomm: Efficient multi-agent communication via adaptive kv cache compression.arXiv preprint ar...

arXiv
[12]

Lragent: Efficient kv cache sharing for multi-lora llm agents.arXiv preprint arXiv:2602.01053,

Hyesung Jeon, Hyeongju Ha, and Jae-Joon Kim. Lragent: Efficient kv cache sharing for multi-lora llm agents.arXiv preprint arXiv:2602.01053,

Pith/arXiv arXiv
[13]

Relaycaching: Accelerating llm collaboration via decoding kv cache reuse.arXiv preprint arXiv:2603.13289,

Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, and Jiang Liu. Relaycaching: Accelerating llm collaboration via decoding kv cache reuse.arXiv preprint arXiv:2603.13289,

arXiv
[14]

Agent memory below the prompt: Persistent q4 kv cache for multi-agent llm inference on edge devices.arXiv preprint arXiv:2603.04428,

Yakov Pyotr Shkolnikov. Agent memory below the prompt: Persistent q4 kv cache for multi-agent llm inference on edge devices.arXiv preprint arXiv:2603.04428,

arXiv
[15]

Agent primitives: Reusable latent building blocks for multi-agent systems.arXiv preprint arXiv:2602.03695,

Haibo Jin, Peng Kuang, Ye Yu, Xiaopeng Yuan, and Haohan Wang. Agent primitives: Reusable latent building blocks for multi-agent systems.arXiv preprint arXiv:2602.03695,

Pith/arXiv arXiv
[16]

Low-latency edge llm handover via joint kv cache transfer and token prefill.arXiv preprint arXiv:2603.28018,

Seunghun Lee, Jihong Park, Ce Zheng, and Hyuncheol Park. Low-latency edge llm handover via joint kv cache transfer and token prefill.arXiv preprint arXiv:2603.28018,

arXiv
[17]

The vision wormhole: Latent-space communication in heterogeneous multi-agent systems

Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu, Hoin Jung, Matt Fredrikson, Xiaoqian Wang, and Jing Gao. The vision wormhole: Latent-space communication in heterogeneous multi-agent systems. arXiv preprint arXiv:2602.15382, 2026b. Guangfu Hao, Yuming Dai, Xianzhe Qin, and Shan Yu. Brain-inspired graph multi-agent systems for llm re...

Pith/arXiv arXiv
[18]

The five ws of multi-agent communication: Who talks to whom, when, what, and why — a survey from marl to emergent language and llms.arXiv preprint arXiv:2602.11583,

Jingdi Chen, Hanqing Yang, Zongjun Liu, and Carlee Joe-Wong. The five ws of multi-agent communication: Who talks to whom, when, what, and why — a survey from marl to emergent language and llms.arXiv preprint arXiv:2602.11583,

arXiv
[19]

Reasoning in latent space: An unconstrained chain-of-thought.arXiv preprint arXiv:2412.06769,

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Reasoning in latent space: An unconstrained chain-of-thought.arXiv preprint arXiv:2412.06769,

Pith/arXiv arXiv
[20]

Seek in the dark: Reasoning via test-time instance-level policy gradient in latent space.arXiv preprint arXiv:2505.13308, 2025c

20 Latent CommunicationA PREPRINT Tian Wang et al. Seek in the dark: Reasoning via test-time instance-level policy gradient in latent space.arXiv preprint arXiv:2505.13308, 2025c. Awesome Latent Space Contributors. Awesome latent space. https://github.com/YU-deep/ Awesome-Latent-Space,

arXiv
[21]

Emergent multi-agent communication in deep reinforcement learning.arXiv preprint arXiv:1706.02295,

Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in deep reinforcement learning.arXiv preprint arXiv:1706.02295,

Pith/arXiv arXiv
[22]

METHODQUICK-REFERENCETABLE Table 6: Quick-reference for all 18 methods

A. METHODQUICK-REFERENCETABLE Table 6: Quick-reference for all 18 methods. WHAT / WHICH / HOW refer to the three axes of the unified framework (Section 4). Method WHAT WHICH HOW Train? CIPHER [Liu et al., 2024] Weighted Embedding Last→First Concat✓ AC [Ye et al., 2025] Hidden State (sel. layer, last tok.) Sel.→Sel. Math✓ Interlat [Du et al., 2026] Hidden ...

2024

[1] [1]

Autogen: Enabling next-gen llm applications via multi-agent conversation.arXiv preprint arXiv:2308.08155,

Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Shaokun Zhang, Sashank Khosla, et al. Autogen: Enabling next-gen llm applications via multi-agent conversation.arXiv preprint arXiv:2308.08155,

Pith/arXiv arXiv

[2] [2]

Metagpt: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,

19 Latent CommunicationA PREPRINT Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, et al. Metagpt: Meta programming for a multi-agent collaborative framework.arXiv preprint arXiv:2308.00352,

Pith/arXiv arXiv

[3] [3]

Camel: Commu- nicative agents for “mind” exploration of llm society.arXiv preprint arXiv:2303.17760,

Guohua Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. Camel: Commu- nicative agents for “mind” exploration of llm society.arXiv preprint arXiv:2303.17760,

Pith/arXiv arXiv

[4] [4]

Rainbowarena: A multi-agent toolkit for reinforcement learning and large language models in tabletop games

Yingzhuo Liu, Shuodi Liu, Hongsong Tang, Yubing Ma, Zikang Li, Junge Zhang, Liuyu Xiang, and Zhaofeng He. Rainbowarena: A multi-agent toolkit for reinforcement learning and large language models in tabletop games. Knowledge-Based Systems, 333:115046, 2026a. doi:10.1016/j.knosys.2025.115046. Shuodi Liu, Yingzhuo Liu, Zi Wang, Yusheng Wang, Huijia Wu, Liuyu...

work page doi:10.1016/j.knosys.2025.115046 2025

[5] [5]

Rong Ye, Xu Zhang, Yizheng Pang, Peng Qi, Zhongwei Wang, et al

URLhttps://arxiv.org/abs/2310.06272. Rong Ye, Xu Zhang, Yizheng Pang, Peng Qi, Zhongwei Wang, et al. Communicating activations between language model agents. InInternational Conference on Machine Learning (ICML),

arXiv

[6] [6]

Xiao Du et al

URL https://arxiv.org/abs/ 2501.14082. Xiao Du et al. Enabling agents to communicate entirely in latent space.arXiv preprint arXiv:2511.09149,

arXiv

[7] [7]

Augmenting multi-agent communication with state delta trajectory.arXiv preprint arXiv:2506.19209,

Runlin Yang, Jucheng Cao, Zhe Zhang, et al. Augmenting multi-agent communication with state delta trajectory.arXiv preprint arXiv:2506.19209,

arXiv

[8] [8]

Thought communication in multiagent collaboration.arXiv preprint arXiv:2510.20733,

Ming Li et al. Thought communication in multiagent collaboration.arXiv preprint arXiv:2510.20733,

arXiv

[9] [9]

Mixture of thoughts: Learning to aggregate what experts think, not just what they say.arXiv preprint arXiv:2509.21164,

Jacob Fein-Ashley, Dhruv Parikh, Rajgopal Kannan, and Viktor Prasanna. Mixture of thoughts: Learning to aggregate what experts think, not just what they say.arXiv preprint arXiv:2509.21164,

arXiv

[10] [10]

Kvcomm: Enabling efficient llm communication through selective kv sharing.arXiv preprint arXiv:2510.03346, 2025a

Yifan Wang et al. Kvcomm: Enabling efficient llm communication through selective kv sharing.arXiv preprint arXiv:2510.03346, 2025a. Accepted at ICLR

arXiv

[11] [11]

Cache-to-cache: Direct semantic communication between large language models.arXiv preprint arXiv:2510.03215, 2025b

Jiaming Liu et al. Cache-to-cache: Direct semantic communication between large language models.arXiv preprint arXiv:2510.03215, 2025b. Zhenyu Wang et al. Latent collaboration in multi-agent systems.arXiv preprint arXiv:2511.20639, 2025b. Kwangyoun Park et al. Q-kvcomm: Efficient multi-agent communication via adaptive kv cache compression.arXiv preprint ar...

arXiv

[12] [12]

Lragent: Efficient kv cache sharing for multi-lora llm agents.arXiv preprint arXiv:2602.01053,

Hyesung Jeon, Hyeongju Ha, and Jae-Joon Kim. Lragent: Efficient kv cache sharing for multi-lora llm agents.arXiv preprint arXiv:2602.01053,

Pith/arXiv arXiv

[13] [13]

Relaycaching: Accelerating llm collaboration via decoding kv cache reuse.arXiv preprint arXiv:2603.13289,

Yingsheng Geng, Yuchong Gao, Weihong Wu, Guyue Liu, and Jiang Liu. Relaycaching: Accelerating llm collaboration via decoding kv cache reuse.arXiv preprint arXiv:2603.13289,

arXiv

[14] [14]

Agent memory below the prompt: Persistent q4 kv cache for multi-agent llm inference on edge devices.arXiv preprint arXiv:2603.04428,

Yakov Pyotr Shkolnikov. Agent memory below the prompt: Persistent q4 kv cache for multi-agent llm inference on edge devices.arXiv preprint arXiv:2603.04428,

arXiv

[15] [15]

Agent primitives: Reusable latent building blocks for multi-agent systems.arXiv preprint arXiv:2602.03695,

Haibo Jin, Peng Kuang, Ye Yu, Xiaopeng Yuan, and Haohan Wang. Agent primitives: Reusable latent building blocks for multi-agent systems.arXiv preprint arXiv:2602.03695,

Pith/arXiv arXiv

[16] [16]

Low-latency edge llm handover via joint kv cache transfer and token prefill.arXiv preprint arXiv:2603.28018,

Seunghun Lee, Jihong Park, Ce Zheng, and Hyuncheol Park. Low-latency edge llm handover via joint kv cache transfer and token prefill.arXiv preprint arXiv:2603.28018,

arXiv

[17] [17]

The vision wormhole: Latent-space communication in heterogeneous multi-agent systems

Xiaoze Liu, Ruowang Zhang, Weichen Yu, Siheng Xiong, Liu He, Feijie Wu, Hoin Jung, Matt Fredrikson, Xiaoqian Wang, and Jing Gao. The vision wormhole: Latent-space communication in heterogeneous multi-agent systems. arXiv preprint arXiv:2602.15382, 2026b. Guangfu Hao, Yuming Dai, Xianzhe Qin, and Shan Yu. Brain-inspired graph multi-agent systems for llm re...

Pith/arXiv arXiv

[18] [18]

The five ws of multi-agent communication: Who talks to whom, when, what, and why — a survey from marl to emergent language and llms.arXiv preprint arXiv:2602.11583,

Jingdi Chen, Hanqing Yang, Zongjun Liu, and Carlee Joe-Wong. The five ws of multi-agent communication: Who talks to whom, when, what, and why — a survey from marl to emergent language and llms.arXiv preprint arXiv:2602.11583,

arXiv

[19] [19]

Reasoning in latent space: An unconstrained chain-of-thought.arXiv preprint arXiv:2412.06769,

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason Weston, and Yuandong Tian. Reasoning in latent space: An unconstrained chain-of-thought.arXiv preprint arXiv:2412.06769,

Pith/arXiv arXiv

[20] [20]

Seek in the dark: Reasoning via test-time instance-level policy gradient in latent space.arXiv preprint arXiv:2505.13308, 2025c

20 Latent CommunicationA PREPRINT Tian Wang et al. Seek in the dark: Reasoning via test-time instance-level policy gradient in latent space.arXiv preprint arXiv:2505.13308, 2025c. Awesome Latent Space Contributors. Awesome latent space. https://github.com/YU-deep/ Awesome-Latent-Space,

arXiv

[21] [21]

Emergent multi-agent communication in deep reinforcement learning.arXiv preprint arXiv:1706.02295,

Angeliki Lazaridou and Marco Baroni. Emergent multi-agent communication in deep reinforcement learning.arXiv preprint arXiv:1706.02295,

Pith/arXiv arXiv

[22] [22]

METHODQUICK-REFERENCETABLE Table 6: Quick-reference for all 18 methods

A. METHODQUICK-REFERENCETABLE Table 6: Quick-reference for all 18 methods. WHAT / WHICH / HOW refer to the three axes of the unified framework (Section 4). Method WHAT WHICH HOW Train? CIPHER [Liu et al., 2024] Weighted Embedding Last→First Concat✓ AC [Ye et al., 2025] Hidden State (sel. layer, last tok.) Sel.→Sel. Math✓ Interlat [Du et al., 2026] Hidden ...

2024