hub

A survey on the memory mechanism of large language model- based agents.ACM Transactions on Information Systems, 43(6):155:1–155:47

Zeyu Zhang, Quanyu Dai, Xiaohe Bo, Chen Ma, Rui Li, Xu Chen · 2025 · DOI 10.1145/3748302

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

open at publisher browse 16 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

cs.CV · 2026-05-11 · unverdicted · novelty 8.0

EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.

EXG: Self-Evolving Agents with Experience Graphs

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench

cs.AI · 2026-04-17 · conditional · novelty 7.0

AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.

Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents

cs.AI · 2026-04-10 · unverdicted · novelty 7.0

CACM improves language-based drug discovery agents by 36.4% via protocol auditing, a grounded diagnostician, and compressed static/dynamic/corrective memory channels that localize failures and bias corrections.

Semantic Level of Detail for Knowledge Graphs: Discovering Abstraction Boundaries via Spectral Heat Diffusion

cs.LG · 2026-03-09 · unverdicted · novelty 7.0

SLoD detects emergent scale boundaries in knowledge graphs by applying spectral heat diffusion to Poincare embeddings, recovering planted hierarchies in synthetic data and aligning with taxonomic depths in WordNet without resolution-parameter tuning.

ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory

cs.AI · 2025-09-29 · conditional · novelty 7.0

ReasoningBank distills generalizable reasoning strategies from agent successes and failures to enable self-evolution, with memory-aware test-time scaling amplifying gains over raw-trajectory or success-only memory on web and software benchmarks.

DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA

cs.CL · 2026-05-21 · unverdicted · novelty 6.0

DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API token cost.

Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models

cs.IR · 2026-05-13 · unverdicted · novelty 6.0

APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.

MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.

Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation

cs.MM · 2026-04-10 · unverdicted · novelty 6.0

Personalized soft prompts steer VLM attention to match user-specific gaze patterns, yielding better attention alignment and click prediction in recommendation simulations.

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

cs.AI · 2026-06-09 · unverdicted · novelty 5.0

ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.

Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction

cs.AI · 2026-04-29 · unverdicted · novelty 5.0

Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.

Towards Self-Improving Error Diagnosis in Multi-Agent Systems

cs.MA · 2026-04-19 · unverdicted · novelty 5.0

ErrorProbe introduces a self-improving pipeline for attributing semantic failures in LLM multi-agent systems to specific agents and steps via anomaly detection, backward tracing, and tool-grounded validation with verified episodic memory.

VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization

cs.HC · 2025-10-13 · unverdicted · novelty 5.0

VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.

Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval

cs.IR · 2026-05-11

citing papers explorer

Showing 16 of 16 citing papers.

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding cs.CV · 2026-05-11 · unverdicted · none · ref 42
EgoMemReason is a new benchmark showing that even the best multimodal models achieve only 39.6% accuracy on reasoning tasks that require integrating sparse evidence across days in egocentric video.
EXG: Self-Evolving Agents with Experience Graphs cs.AI · 2026-05-18 · unverdicted · none · ref 54
EXG is an experience graph framework for self-evolving LLM agents that supports online real-time growth and offline reuse to enhance solution quality and efficiency on code generation and reasoning benchmarks.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 84
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Evaluating Tool-Using Language Agents: Judge Reliability, Propagation Cascades, and Runtime Mitigation in AgentProp-Bench cs.AI · 2026-04-17 · conditional · none · ref 31
AgentProp-Bench shows substring judging agrees with humans at kappa=0.049, LLM ensemble at 0.432, bad-parameter injection propagates with ~0.62 probability, rejection and recovery are independent, and a runtime fix cuts hallucinations 23pp on GPT-4o-mini but not Gemini-2.0-Flash.
Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents cs.AI · 2026-04-10 · unverdicted · none · ref 42
CACM improves language-based drug discovery agents by 36.4% via protocol auditing, a grounded diagnostician, and compressed static/dynamic/corrective memory channels that localize failures and bias corrections.
Semantic Level of Detail for Knowledge Graphs: Discovering Abstraction Boundaries via Spectral Heat Diffusion cs.LG · 2026-03-09 · unverdicted · none · ref 29
SLoD detects emergent scale boundaries in knowledge graphs by applying spectral heat diffusion to Poincare embeddings, recovering planted hierarchies in synthetic data and aligning with taxonomic depths in WordNet without resolution-parameter tuning.
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory cs.AI · 2025-09-29 · conditional · none · ref 28
ReasoningBank distills generalizable reasoning strategies from agent successes and failures to enable self-evolution, with memory-aware test-time scaling amplifying gains over raw-trajectory or success-only memory on web and software benchmarks.
DeferMem: Query-Time Evidence Distillation via Reinforcement Learning for Long-Term Memory QA cs.CL · 2026-05-21 · unverdicted · none · ref 58
DeferMem decouples memory QA into high-recall retrieval and RL-based query-conditioned evidence distillation, outperforming baselines on LoCoMo and LongMemEval-S with highest accuracy, fastest runtime, and zero API token cost.
Task-Aware Automated User Profile Generation for Recommendation Simulation Using Large Language Models cs.IR · 2026-05-13 · unverdicted · none · ref 49
APG4RecSim automatically generates realistic user profiles for LLM-based recommendation simulations, outperforming manual baselines by up to 7% in nDCG@10 and 8% in JSD on three benchmark datasets.
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents cs.CL · 2026-05-01 · unverdicted · none · ref 25
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
Through Their Eyes: Fixation-aligned Tuning for Personalized User Emulation cs.MM · 2026-04-10 · unverdicted · none · ref 45
Personalized soft prompts steer VLM attention to match user-specific gaze patterns, yielding better attention alignment and click prediction in recommendation simulations.
ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning cs.AI · 2026-06-09 · unverdicted · none · ref 28
ActiveMem proposes a heterogeneous distributed memory framework for LLM agents that separates planning from active memory management, reporting SOTA accuracy with lower overhead on BrowseComp-Plus and GAIA.
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction cs.AI · 2026-04-29 · unverdicted · none · ref 30
Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
Towards Self-Improving Error Diagnosis in Multi-Agent Systems cs.MA · 2026-04-19 · unverdicted · none · ref 85
ErrorProbe introduces a self-improving pipeline for attributing semantic failures in LLM multi-agent systems to specific agents and steps via anomaly detection, backward tracing, and tool-grounded validation with verified episodic memory.
VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization cs.HC · 2025-10-13 · unverdicted · none · ref 59
VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.
Structured Belief State and the First Precision-Aware Benchmark for LLM Memory Retrieval cs.IR · 2026-05-11 · unreviewed · ref 10

A survey on the memory mechanism of large language model- based agents.ACM Transactions on Information Systems, 43(6):155:1–155:47

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer