When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration

· 2026 · cs.LG · arXiv 2604.13349

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches. However, full KV relay incurs high memory and communication cost. We adapt KV-cache eviction methods to this setting and introduce \textbf{Orthogonal BackFill (OBF)} to mitigate information loss from hard eviction. OBF injects a low-rank orthogonal residual from discarded KV states into the retained KV states. We evaluate OBF against full KV relay on nine benchmarks spanning mathematical reasoning, expert and commonsense QA, and coding. With only 9.9%-20.2% of the prompt KV states retained, H-OBF delivers between $97%$ and $120%$ of full KV relay's per-benchmark accuracy across the nine benchmarks. This suggests that more information does not necessarily lead to better communication; preserving the most useful information matters more. Our codebase is included in the supplementary material. Our codebase is publicly available on https://github.com/markli404/When-Less-Latent-Leads-to-Better-Relay.

representative citing papers

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents

cs.MA · 2026-06-11 · unverdicted · novelty 7.0

Heterogeneous agents achieve dense latent KV-cache communication via lightweight cross-model transformation and two-phase training, outperforming text at lower compute in context-aware settings and enabling context-unaware transfer.

citing papers explorer

Showing 1 of 1 citing paper after filters.

See What I See, Know What I Think: Dense Latent Communication Across Heterogeneous Agents cs.MA · 2026-06-11 · unverdicted · none · ref 15 · internal anchor
Heterogeneous agents achieve dense latent KV-cache communication via lightweight cross-model transformation and two-phase training, outperforming text at lower compute in context-aware settings and enabling context-unaware transfer.

When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration

fields

years

verdicts

representative citing papers

citing papers explorer