{"total":14,"items":[{"citing_arxiv_id":"2607.01935","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A-TMA: Decoupling State-Aware Memory Failures in Long-Term Agent Memory","primary_cat":"cs.AI","submitted_at":"2026-07-02T09:28:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ATMA adds state labels and evidence packets to existing memory systems to reduce ghost memory failures, with reported gains on a new LTP benchmark and LoCoMo.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30949","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AgRefactor: Self-Evolving Agentic Workflow for HLS Compatibility and Performance","primary_cat":"cs.AI","submitted_at":"2026-06-29T22:02:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AgRefactor deploys a self-evolving multi-agent workflow that combines LLM rewrites with automated tools to convert software into HLS code, matching or beating baselines on long benchmarks and delivering 6.51x geometric mean speedup after optimization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.25161","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TRUSTMEM: Learning Trustworthy Memory Consolidation for LLM Agents with Long-Term Memory","primary_cat":"cs.AI","submitted_at":"2026-06-23T20:49:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"TrustMem introduces a verifier for memory update transitions and preference-guided RL to cut omission, corruption, and hallucination rates in LLM agent memory while reaching SOTA on MemoryAgentBench and HaluMem.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20295","ref_index":111,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Token-Operations-Oriented Inference Optimization Techniques for Large Models","primary_cat":"cs.SE","submitted_at":"2026-06-18T14:33:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper introduces a four-layer technical architecture for token-operations-oriented inference optimization in large models and reviews key technologies and industry status at each layer.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Academic research has also proposed various KV cache compression methods based on importance and low-bit representations. Scissorhands, H_2O, and SnapKV each retain more critical historical states from the perspectives of importance persistence, Heavy Hitter Tokens, and pre-generated attention patterns, compressing the KV cache under a fixed budget [109-111]. KIVI, KVZip, and TurboQuant further reduce the cost of cache representation via KV cache quantization or context reconstruction [112-114]. A common challenge for these approaches is balancing compression ratio, long-context quality, decoding latency, and kernel support. Complementing KV cache compression, context compression techniques are becoming important supplements for"},{"citing_arxiv_id":"2606.17328","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MemTrace: Probing What Final Accuracy Misses in Long-Term Memory","primary_cat":"cs.AI","submitted_at":"2026-06-15T22:21:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemTrace shows that evidence utilization, not retrieval, is the dominant failure mode in LLM long-term memory systems across tested configurations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05894","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"EMBER: Efficient Memory via Budgeted Evidence Retention for Long-Horizon Agents","primary_cat":"cs.CL","submitted_at":"2026-06-04T09:03:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EMBER learns to retain source-backed evidence capsules under a fixed token budget, improving F1, Retain-Recall, and Read-Recall on LongMemEval-RR over budgeted baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.01223","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue","primary_cat":"cs.CL","submitted_at":"2026-05-31T13:16:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RefMem-Bench benchmarks reflective memory in dialogue with 26K instances across eight dimensions, and REMIND improves model accuracy via hierarchical evidence retrieval, grounding, and abstraction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.28773","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Memory as Continuously Evolving Connectivity","primary_cat":"cs.CL","submitted_at":"2026-05-27T17:35:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FluxMem evolves memory as a heterogeneous graph via three refinement stages and reports consistent state-of-the-art results on LoCoMo, Mind2Web, and GAIA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25680","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Simulating Human Memory with Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-25T10:39:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06169","ref_index":56,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"In-Place Test-Time Training","primary_cat":"cs.LG","submitted_at":"2026-04-07T17:59:44+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"In-Place TTT adapts LLM MLP projection matrices at test time with a next-token-aligned objective and chunk-wise updates, enabling better long-context performance as a drop-in enhancement.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Test-time regression: a unifying framework for designing sequence models with associative memory.arXiv preprint arXiv:2501.12352, 2025. 14 [55] Yu Wang, Yifan Gao, Xiusi Chen, Haoming Jiang, Shiyang Li, Jingfeng Yang, Qingyu Yin, Zheng Li, Xian Li, Bing Yin, Jingbo Shang, and Julian McAuley. Memoryllm: Towards self-updatable large language models, 2024. URLhttps://arxiv.org/abs/2402.04624. [56] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023. URL https: //arxiv.org/abs/2201.11903. [57] An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al."},{"citing_arxiv_id":"2602.13933","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling","primary_cat":"cs.AI","submitted_at":"2026-02-15T00:06:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower computational cost on LOCOMO and LongMemEval benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.02805","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning","primary_cat":"cs.CL","submitted_at":"2025-11-04T18:27:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.05257","ref_index":38,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions","primary_cat":"cs.CL","submitted_at":"2025-07-07T17:59:54+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.15965","ref_index":148,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs","primary_cat":"cs.IR","submitted_at":"2025-04-22T15:05:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"The paper surveys human memory categories, maps them to LLM memory, and proposes a new three-dimension (object, form, time) categorization into eight quadrants to organize existing work and highlight open problems.","context_count":1,"top_context_role":"method","top_context_polarity":"background","context_text":"RAGCache [131], SGLang [132], Ada-KV [133], HCache [134], Cake [135], EPIC [136], RelayAttention [137], Marconi [138], IKS [139], FastCache [140], Cache-Craft [141], KVLink [142], RAGServe [143], BumbleBee [144] VIII System Parametric Long-Term Parametric Memory Structures Memorizing Transformer [145], Focused Transformer [146], MAC [147], MemoryLLM [148], WISE [149], LongMem [150], LM2 [151], Titans [152] Table 3: System Memory 12 4.1 Contextual System Memory From a temporal perspective, non-parametric short-term system memory refers to a series of rea- soning and action results generated by large language models during task execution. This form of memory supports enhanced reasoning and planning within the context of the current task, thereby"}],"limit":50,"offset":0}