{"total":14,"items":[{"citing_arxiv_id":"2605.25971","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents","primary_cat":"cs.CL","submitted_at":"2026-05-25T15:47:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ProAct uses idle compute to anticipate user needs via dialogue history and memory, achieving 14.8% fewer turns, 11.7% less user effort, and 28.1% fewer hallucinations than reactive baselines on the new ProActEval benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15710","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory","primary_cat":"cs.CL","submitted_at":"2026-05-15T08:00:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13941","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents","primary_cat":"cs.LG","submitted_at":"2026-05-13T17:12:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EvolveMem enables autonomous self-evolution of LLM memory retrieval configurations via LLM diagnosis and safeguards, delivering 25.7% gains over strong baselines on LoCoMo and 18.9% on MemBench with positive cross-benchmark transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09942","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution","primary_cat":"cs.AI","submitted_at":"2026-05-11T03:41:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16839","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HeLa-Mem: Hebbian Learning and Associative Memory for LLM Agents","primary_cat":"cs.CL","submitted_at":"2026-04-18T05:11:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HeLa-Mem is a graph-based memory architecture for LLM agents that applies Hebbian learning to episodic associations and distills hubs into semantic knowledge, yielding better results on long-context benchmarks with fewer tokens.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06845","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues","primary_cat":"cs.CL","submitted_at":"2026-04-08T09:07:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HingeMem segments dialogue memory via boundary-triggered hyperedges over four elements and applies query-adaptive retrieval, yielding ~20% relative gains and 68% lower QA token cost versus baselines on LOCOMO.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.21046","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence","primary_cat":"cs.AI","submitted_at":"2025-07-28T17:59:05+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.15965","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs","primary_cat":"cs.IR","submitted_at":"2025-04-22T15:05:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"MemInsight [50] Management MemoChat [51], MemoryBank [17], RMM [52], LD-Agent [53], A-MEM [54], Generative Agents [55], EMG-RAG [56], KGT [46], LLM-Rsum [57], COMEDY [58] Retrieval RET-LLM [44], ChatDB [59], Human-like Memory [60], HippoRAG [13], HippoRAG 2 [61], EgoRAG [62], MemInsight [50] Usage MemoCRS [63], RecMind [64], RecAgent [65], InteRecAgent [66], SCM [67], ChatDev [68], MetaAgents [69], S3 [70], TradingGPT [71], Memolet [72], Synaptic Resonance [14], MemReasoner [73] Benchmark MADial-Bench [74], LOCOMO [75], MemDaily [76], ChMapData [77], MSC [78], MMRC [79], Ego4D [80], EgoLife [62], BABILong [81, 82] III Personal Parametric Short-Term Caching for Acceleration Prompt Cache [83], Contextual Retrieval [84]"},{"citing_arxiv_id":"2502.12110","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A-MEM: Agentic Memory for LLM Agents","primary_cat":"cs.CL","submitted_at":"2025-02-17T18:36:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A-MEM is a dynamic memory system for LLM agents that builds and refines an interconnected network of notes with agent-driven linking and evolution, showing performance gains over prior memory methods on six models.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"Prior works on LLM agent memory systems have explored various mechanisms for memory man- agement and utilization [ 23, 21, 8, 39]. Some approaches complete interaction storage, which maintains comprehensive historical records through dense retrieval models [39] or read-write memory structures [24]. Moreover, MemGPT [ 25] leverages cache-like architectures to prioritize recent information. Similarly, SCM [ 32] proposes a Self-Controlled Memory framework that enhances LLMs' capability to maintain long-term memory through a memory stream and controller mechanism. However, these approaches face significant limitations in handling diverse real-world tasks. While they can provide basic memory functionality, their operations are typically constrained by predefined"},{"citing_arxiv_id":"2404.13501","ref_index":98,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey on the Memory Mechanism of Large Language Model based Agents","primary_cat":"cs.AI","submitted_at":"2024-04-21T01:49:46+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"the information across different trials, and the external knowledge. The former two are dynamically 11 Table 1: Summarization of the memory sources. We use ✓ and × to label whether or not the corresponding source is adopted in the model. Models Inside-trial Information Cross-trial Information External Knowledge MemoryBank [6] ✓ × × RET-LLM [7] ✓ × ✓ ChatDB [96] ✓ × ✓ TiM [97] ✓ × × SCM [98] ✓ × × V oyager [99] ✓ × × MemGPT [100] ✓ × × MemoChat [94] ✓ × × MPC [101] ✓ × × Generative Agents [83] ✓ × × RecMind [102] ✓ × ✓ Retroformer [103] ✓ ✓ ✓ ExpeL [82] ✓ ✓ ✓ Synapse [91] ✓ ✓ × GITM [93] ✓ ✓ ✓ ReAct [104] ✓ × ✓ Reflexion [5] ✓ ✓ ✓ RecAgent [95] ✓ × × Character-LLM [105] ✓ × ✓ MAC [106] ✓ × × Huatuo [107] ✓ × ✓ ChatDev [1] ✓ × × InteRecAgent [108] ✓ × ✓"},{"citing_arxiv_id":"2402.17753","ref_index":136,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating Very Long-Term Conversational Memory of LLM Agents","primary_cat":"cs.CL","submitted_at":"2024-02-27T18:42:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.07864","ref_index":169,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Rise and Potential of Large Language Model Based Agents: A Survey","primary_cat":"cs.AI","submitted_at":"2023-09-14T17:12:03+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[158], Mitchell et al. [159], etc. Mitigate hallucination Manakul et al. [160], Qin et al. [94], Li et al. [161], Gou et al. [162], etc. Memory §3.1.3 Memory capability Raising the length limit of Transformers BART [163], Park et al. [164], LongT5 [165], CoLT5 [166], Ruoss et al. [167], etc. Summarizing memory Generative Agents [22], SCM [168], Reflexion [169], Memory- bank [170], ChatEval [171], etc. Compressing mem- ories with vectors or data structures ChatDev [109], GITM [172], RET-LLM [173], AgentSims [174], ChatDB [175], etc. Memory retrieval Automated retrieval Generative Agents [22], Memory- bank [170], AgentSims [174], etc. Interactive retrieval Memory Sandbox[176], ChatDB [175], etc. Reasoning &"},{"citing_arxiv_id":"2308.14508","ref_index":98,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding","primary_cat":"cs.CL","submitted_at":"2023-08-28T11:53:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LongBench is the first bilingual multi-task benchmark for long context understanding in LLMs, containing 21 datasets in 6 categories with average lengths of 6711 words (English) and 13386 characters (Chinese).","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2308.11432","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey on Large Language Model based Autonomous Agents","primary_cat":"cs.AI","submitted_at":"2023-08-22T13:30:37+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Long-term memory provides sta- ble knowledge, while short-term memory allows flexible planning. Reflexion [12] utilizes a short- term sliding window to capture recent feedback and incorporates persistent long-term storage to retain condensed insights. This combination allows for the utilization of both detailed immediate experiences and high-level abstractions. SCM [35] selectively activates the most relevant long-term knowledge to combine with short-term memory, enabling rea- soning over complex contextual dialogues. Sim- plyRetrieve [36] utilizes user queries as short-term memory and stores long-term memory using pri- vate knowledge bases. This design enhances the model accuracy while guaranteeing user privacy."}],"limit":50,"offset":0}