pith. sign in

arxiv: 2602.00471 · v2 · pith:WVAJGKVRnew · submitted 2026-01-31 · 💻 cs.AI · cs.CV

Dual Latent Memory for Visual Multi-agent System

classification 💻 cs.AI cs.CV
keywords whileduallatentmulti-agentcollaborationinformationinter-agentmemories
0
0 comments X
read the original abstract

While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive "scaling wall": increasing agent turns often degrades performance while exponentially inflating token costs. We attribute this failure to the information bottleneck inherent in text-centric communication, where converting perceptual and thinking trajectories into discrete natural language inevitably induces semantic loss. To this end, we propose \textbf{L}$\mathbf{^{2}}$\textbf{-VMAS}, a novel model-agnostic framework that enables inter-agent collaboration with dual latent memories. Furthermore, we decouple the perception and thinking while dynamically synthesizing dual latent memories. Additionally, we introduce an entropy-driven proactive triggering that replaces passive information transmission with efficient, on-demand memory access. Extensive experiments among backbones, sizes, and multi-agent structures demonstrate that our method effectively breaks the "scaling wall" with superb scalability, improving average accuracy by 2.7-5.4% while reducing token usage by 21.3-44.8%.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Experience to Skill: Multi-Agent Generative Engine Optimization via Reusable Strategy Learning

    cs.AI 2026-04 unverdicted novelty 7.0

    MAGEO is a multi-agent system that distills validated editing patterns into reusable optimization skills for generative engines, outperforming heuristic baselines on visibility and fidelity via a new benchmark and eva...

  2. SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology

    cs.AI 2026-04 unverdicted novelty 6.0

    SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.

  3. Latent Action Reparameterization for Efficient Agent Inference

    cs.AI 2026-05 unverdicted novelty 5.0

    LAR learns a compact latent action space from trajectories that shortens the effective decision horizon for LLM agents, reducing token count and inference time while preserving task success.