Heterogeneous agents achieve dense latent KV-cache communication via lightweight cross-model transformation and two-phase training, outperforming text at lower compute in context-aware settings and enabling context-unaware transfer.
When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches. However, full KV relay incurs high memory and communication cost. We adapt KV-cache eviction methods to this setting and introduce \textbf{Orthogonal BackFill (OBF)} to mitigate information loss from hard eviction. OBF injects a low-rank orthogonal residual from discarded KV states into the retained KV states. We evaluate OBF against full KV relay on nine benchmarks spanning mathematical reasoning, expert and commonsense QA, and coding. With only 9.9%-20.2% of the prompt KV states retained, H-OBF delivers between $97%$ and $120%$ of full KV relay's per-benchmark accuracy across the nine benchmarks. This suggests that more information does not necessarily lead to better communication; preserving the most useful information matters more. Our codebase is included in the supplementary material. Our codebase is publicly available on https://github.com/markli404/When-Less-Latent-Leads-to-Better-Relay.
fields
cs.MA 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents
Heterogeneous agents achieve dense latent KV-cache communication via lightweight cross-model transformation and two-phase training, outperforming text at lower compute in context-aware settings and enabling context-unaware transfer.