Cost and Accuracy of Long-Term Memory in Distributed Multi-Agent Systems Based on Large Language Models
read the original abstract
Long-term memory (LTM) is fundamental to large language model (LLM)-based agents in the emerging Internet of Agents (IoA), where distributed multi-agent systems (DMAS) span cloud and edge networks. Existing evaluations are typically published by framework providers and focus on token usage and latency, rarely accounting for system-level cost or deployment in DMAS. These gaps are addressed with an independent reproducible testbed that evaluates accuracy, latency, CPU time, peak RAM, disk I/O and network usage in a simulated cloud-edge environment. Three venture capital-funded frameworks spanning vector, graph, and hybrid architectures, namely mem0, Graphiti, and cognee, are compared alongside retrieval-augmented generation (RAG) and full-context baselines on the LoCoMo benchmark under unconstrained and constrained network scenarios. Two clusters emerge: mem0, RAG, and full-context reach 77% to 81% accuracy, while Graphiti and cognee reach only 55% to 56%, a gap driven by retrieval incompleteness rather than reasoning failure. The RAG baseline matches the upper cluster at 8.4 times lower total cost of ownership (TCO) than mem0, and both are the only non-dominated backends on the Pareto frontier. Latency and bandwidth constraints as well as jitter leave retrieval quality unchanged for every backend, while vector-based LTM incurs a modest latency penalty of 4% to 5% under edge-cloud constraints. Compression precision rather than context volume determines LTM accuracy, as full-context forwarding underperforms mem0 despite supplying the entire conversation for each question.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
-
Evaluating Memory Condensation Strategies for Coding Agents in Data-Driven Scientific Discovery
Empirical evaluation of eight memory condensation strategies on 480 DiscoveryBench tasks finds no significant impact on hypothesis quality but domain-dependent differences in token efficiency.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.