Cost and Accuracy of Long-Term Memory in Distributed Multi-Agent Systems Based on Large Language Models

Benedict Wolff; Jacopo Bennati

arxiv: 2601.07978 · v3 · pith:DBQUGGVXnew · submitted 2026-01-12 · 💻 cs.IR

Cost and Accuracy of Long-Term Memory in Distributed Multi-Agent Systems Based on Large Language Models

Benedict Wolff , Jacopo Bennati This is my paper

classification 💻 cs.IR

keywords accuracylatencymem0costfull-contextagentscogneeconstraints

0 comments

read the original abstract

Long-term memory (LTM) is fundamental to large language model (LLM)-based agents in the emerging Internet of Agents (IoA), where distributed multi-agent systems (DMAS) span cloud and edge networks. Existing evaluations are typically published by framework providers and focus on token usage and latency, rarely accounting for system-level cost or deployment in DMAS. These gaps are addressed with an independent reproducible testbed that evaluates accuracy, latency, CPU time, peak RAM, disk I/O and network usage in a simulated cloud-edge environment. Three venture capital-funded frameworks spanning vector, graph, and hybrid architectures, namely mem0, Graphiti, and cognee, are compared alongside retrieval-augmented generation (RAG) and full-context baselines on the LoCoMo benchmark under unconstrained and constrained network scenarios. Two clusters emerge: mem0, RAG, and full-context reach 77% to 81% accuracy, while Graphiti and cognee reach only 55% to 56%, a gap driven by retrieval incompleteness rather than reasoning failure. The RAG baseline matches the upper cluster at 8.4 times lower total cost of ownership (TCO) than mem0, and both are the only non-dominated backends on the Pareto frontier. Latency and bandwidth constraints as well as jitter leave retrieval quality unchanged for every backend, while vector-based LTM incurs a modest latency penalty of 4% to 5% under edge-cloud constraints. Compression precision rather than context volume determines LTM accuracy, as full-context forwarding underperforms mem0 despite supplying the entire conversation for each question.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
cs.CL 2026-05 unverdicted novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery
cs.LG 2026-05 unverdicted novelty 6.0

Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.