pith. machine review for the scientific record. sign in

arxiv: 2605.03804 · v1 · submitted 2026-05-05 · 💻 cs.AI

Recognition: unknown

ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

Jiale Chang, Yuxiang Ren

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:18 UTC · model grok-4.3

classification 💻 cs.AI
keywords on-device memoryLLM agentsmemory compressionoptical forgettingepisodic memory graphmultimodal memorypersonalized agentsedge AI
0
0 comments X

The pith

ScrapMem lets LLM agents keep long-term multimodal memories on edge devices by progressively lowering the resolution of old entries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to solve the storage and complexity problems that prevent LLM agents from maintaining useful personalized memory over long periods when running on phones or other limited hardware. It does this by turning incoming multimodal data into scrapbook-style pages, then applying optical forgetting to shrink the detail level of older pages while keeping the most recent ones intact. An Episodic Memory Graph links the remaining entries into a causal timeline so the agent can still retrieve relevant past events efficiently. Experiments on the ATM-Bench dataset show the method reaches a new best Joint@10 score of 51.0 percent, cuts memory use by as much as 93 percent, and lifts Recall@10 to 70.3 percent. If the approach works as described, on-device agents could retain weeks or months of personal context without needing constant cloud uploads or oversized local storage.

Core claim

ScrapMem integrates multimodal inputs into Scrapbook Pages, applies optical forgetting that progressively reduces resolution of older memories to cut storage cost while suppressing low-value details, and builds an Episodic Memory Graph to preserve causal-temporal relationships among key events; on the multimodal ATM-Bench this yields 51.0 percent Joint@10, up to 93 percent lower memory usage, and 70.3 percent Recall@10.

What carries the argument

Optical Forgetting, a progressive resolution-reduction step applied to older memories, supported by an Episodic Memory Graph that links events in causal-temporal order to keep retrieval accurate after compression.

If this is right

  • Agents running locally can sustain much longer interaction histories without exhausting device storage.
  • Structured graph aggregation raises the chance that relevant past episodes are retrieved even after compression.
  • Multimodal on-device agents become practical for personalized tasks without constant data transfer.
  • Memory management can shift from keeping everything to selectively discarding detail in a controlled way.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same forgetting pattern could be tested on non-LLM memory systems such as robotic state trackers to see whether resolution reduction still preserves task-critical information.
  • Real-device measurements of power draw and latency after applying optical forgetting would show whether the storage savings translate into usable runtime gains.
  • Extending the Episodic Memory Graph with explicit decay rates might allow further tuning of how quickly older events lose detail.

Load-bearing premise

Lowering the resolution of older memories keeps their semantic content usable and does not erase or distort important multimodal details that the agent will later need.

What would settle it

A controlled test in which memories compressed by optical forgetting cause the agent to give incorrect answers on questions about past events that were still present before compression, dropping performance below the reported baseline.

Figures

Figures reproduced from arXiv: 2605.03804 by Jiale Chang, Yuxiang Ren.

Figure 1
Figure 1. Figure 1: Comparison between human memory (CLS theory) and Scrapbook Memory. Top: The hippocampus rapidly encodes multimodal episodic experiences, while the neocortex gradually consolidates them into stable long-term knowledge. Bottom: ScrapMem similarly binds heterogeneous user data into scrapbook pages and progressively compresses old memories via optical for￾getting, preserving core semantics for efficient retrie… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the ScrapMem. (1) Consolidation and Perception: Unifies heterogeneous records (images, videos, text) into hybrid representations via OCR and vision-to-text extraction. (2) EM-Graph Construction: Organizes nodes into an Episodic Memory Graph with event-centric paths (EM-Paths) for structured retrieval and multi-hop reasoning. (3) Optical Forgetting: Compresses outdated memories through temporal … view at source ↗
Figure 3
Figure 3. Figure 3: Retrieval performance (Recall@K) under varying optical forgetting intensities. The clustering of different forgetting curves demonstrates that ScrapMem is highly robust to specific hyperparameter configura￾tions. quality (Q), resolution scaling factor (S), and tem￾poral stage boundaries (T) for Recent, Mid-term, and Old memories, respectively view at source ↗
Figure 4
Figure 4. Figure 4: Storage–performance trade-off on ATM￾Bench (Joint@10). The x-axis uses a logarithmic scale. ScrapMem (Timed-Gentle, orange star) reduces stor￾age by 93.0% relative to the raw-data baseline while retaining over 90% of SOTA performance (46.3% vs. 51.0%). The Pareto frontier indicates strong efficiency and graceful performance degradation, supporting on￾device deployment. strengthen long-range reasoning. Exte… view at source ↗
read the original abstract

Long-term personalized memory for LLM agents is challenging on resource-limited edge devices due to high storage costs and multimodal complexity. To address this, we propose ScrapMem, a framework that integrates multimodal data into "Scrapbook Page." ScrapMem introduces Optical Forgetting, an optical compression mechanism that progressively reduces the resolution of older memories, lowering storage cost while suppressing low-value details. To maintain semantic consistency, we construct an Episodic Memory Graph (EM-Graph) that organizes key events into a causal-temporal structure. Extensive experiments on the multimodal ATM-Bench showcase that ScrapMem provides three main benefits: (1) strong performance, achieving a new state-of-the-art with a 51.0% Joint@10 score; (2) high storage efficiency, reducing memory usage by up to 93% via optical forgetting; and (3) improved recall, increasing Recall@10 to 70.3% through structured aggregation. ScrapMem offers an effective and storage-efficient solution for on-device long-term memory in multimodal LLM agents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes ScrapMem, a bio-inspired framework for on-device long-term personalized memory in multimodal LLM agents. It integrates multimodal data into 'Scrapbook Page' structures, introduces Optical Forgetting as a progressive resolution-reduction mechanism for older memories to cut storage costs, and builds an Episodic Memory Graph (EM-Graph) to enforce causal-temporal organization of key events. Experiments on the multimodal ATM-Bench are reported to deliver a new SOTA of 51.0% Joint@10, up to 93% memory reduction, and 70.3% Recall@10 via structured aggregation.

Significance. If the empirical results hold after proper validation, ScrapMem would represent a meaningful advance for resource-constrained edge agents by addressing the tension between long-term multimodal memory and storage limits. The combination of bio-inspired compression with graph-structured retention is conceptually appealing and could influence subsequent work on efficient agent memory. No machine-checked proofs, reproducible code artifacts, or parameter-free derivations are present to credit.

major comments (3)
  1. Abstract: The central performance claims (51.0% Joint@10 SOTA, 93% storage reduction, 70.3% Recall@10) are asserted without any description of baselines, experimental setup, error bars, statistical significance, or implementation details of Optical Forgetting, making it impossible to verify support for the claims from the available text.
  2. Method section on Optical Forgetting: The mechanism that progressively lowers resolution of older memories is described only at a high level; no concrete algorithm, information-loss metrics, or ablations isolating its effect on semantic consistency and multimodal fidelity are supplied, which is load-bearing for both the efficiency and recall claims.
  3. Experiments / ATM-Bench results: No quantitative evidence (e.g., retention metrics, consistency scores, or ablation tables) is given to substantiate that the Episodic Memory Graph preserves causal-temporal structure and critical multimodal details under Optical Forgetting; without these, the 93% reduction could mask unmeasured recall degradation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important areas for improving clarity and substantiation of our claims. We address each major comment point by point below and commit to revisions that will strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: Abstract: The central performance claims (51.0% Joint@10 SOTA, 93% storage reduction, 70.3% Recall@10) are asserted without any description of baselines, experimental setup, error bars, statistical significance, or implementation details of Optical Forgetting, making it impossible to verify support for the claims from the available text.

    Authors: We agree that the abstract, as a concise summary, omits these supporting details. The full manuscript provides baselines and setup in Section 4.1, error bars and significance testing in the results tables of Section 4, and Optical Forgetting implementation in Section 3.2. To address the concern directly, we will revise the abstract to include a brief reference to the primary baselines (e.g., standard retrieval and memory-augmented agents), the ATM-Bench evaluation protocol, and a note that detailed metrics and ablations appear in the experiments section. This change will make the performance claims more self-contained while preserving the abstract's brevity. revision: yes

  2. Referee: Method section on Optical Forgetting: The mechanism that progressively lowers resolution of older memories is described only at a high level; no concrete algorithm, information-loss metrics, or ablations isolating its effect on semantic consistency and multimodal fidelity are supplied, which is load-bearing for both the efficiency and recall claims.

    Authors: The current description emphasizes the bio-inspired motivation and high-level progressive reduction process. We acknowledge that a more concrete specification is needed to support the efficiency and fidelity claims. In the revised manuscript, we will expand Section 3.2 to include the explicit algorithm (step-wise resolution scaling with modality-specific parameters), quantitative information-loss metrics (e.g., embedding similarity and perceptual quality scores), and dedicated ablation tables isolating Optical Forgetting's contribution to storage reduction versus semantic consistency. These additions will directly substantiate the 93% reduction claim. revision: yes

  3. Referee: Experiments / ATM-Bench results: No quantitative evidence (e.g., retention metrics, consistency scores, or ablation tables) is given to substantiate that the Episodic Memory Graph preserves causal-temporal structure and critical multimodal details under Optical Forgetting; without these, the 93% reduction could mask unmeasured recall degradation.

    Authors: The reported results focus on end-to-end Joint@10 and Recall@10 metrics on ATM-Bench. We recognize that explicit evidence linking the EM-Graph to structure preservation under forgetting is required to rule out hidden degradation. We will add, in the revised experiments section, quantitative retention metrics (causal edge preservation rates and multimodal detail fidelity scores), consistency scores across forgetting levels, and ablation tables comparing performance with and without the EM-Graph. These will demonstrate that the observed recall improvements and storage savings are not achieved at the expense of unmeasured structural loss. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical results independent of inputs

full rationale

The paper proposes ScrapMem with Optical Forgetting for progressive resolution reduction and an Episodic Memory Graph for causal-temporal organization, then reports experimental outcomes on ATM-Bench including 51.0% Joint@10, 70.3% Recall@10, and up to 93% storage reduction. No equations, parameter fits, or derivations are present that reduce any claimed prediction or result to the inputs by construction. Claims rest on external benchmark evaluation rather than self-definitional loops, fitted-input renamings, or load-bearing self-citations, rendering the chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The framework introduces new mechanisms without citing prior independent evidence for their effectiveness; relies on the assumption that multimodal data can be progressively compressed while retaining utility.

axioms (1)
  • domain assumption Multimodal memories can be progressively reduced in resolution without losing semantic value for agent tasks
    Invoked to justify optical forgetting as a viable compression strategy.
invented entities (2)
  • Optical Forgetting no independent evidence
    purpose: Progressively reduce resolution of older memories to lower storage cost
    New compression mechanism central to the efficiency claim
  • Episodic Memory Graph (EM-Graph) no independent evidence
    purpose: Organize key events into causal-temporal structure for consistency
    New structure to maintain semantic consistency during compression

pith-pipeline@v0.9.0 · 5477 in / 1394 out tokens · 58629 ms · 2026-05-07T16:18:31.815932+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 30 canonical work pages · 13 internal anchors

  1. [6]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , year =

    Adyasha Maharana and Dong-Ho Lee and Sergey Tulyakov and Mohit Bansal and Francesco Barbieri and Yuwei Fang , title =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  2. [7]

    O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

    Joon Sung Park and Joseph C. O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S. Bernstein , title =. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST) , year =

  3. [9]

    Canny and Ian Fischer , title =

    Kuang-Huei Lee and Xinyun Chen and Hiroki Furuta and John F. Canny and Ian Fischer , title =. International Conference on Machine Learning (ICML) , year =

  4. [10]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Tom Brown and Benjamin Mann and Nick Ryder and Melanie Subbiah and Jared Kaplan and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  5. [12]

    ACM Transactions on Information Systems , volume =

    Ting Bai and Le Huang and Yue Yu and Cheng Yang and Cheng Hou and Zhe Zhao and Chuan Shi , title =. ACM Transactions on Information Systems , volume =

  6. [14]

    Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

    Wanjun Zhong and Lianghong Guo and Qiqi Gao and He Ye and Yanlin Wang , title =. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

  7. [15]

    International Conference on Machine Learning (ICML) , year =

    Sebastian Borgeaud and Arthur Mensch and Jordan Hoffmann and Trevor Cai and Eliza Rutherford and others , title =. International Conference on Machine Learning (ICML) , year =

  8. [16]

    CoRR , volume =

    Darren Edge and Ha Trinh and Newman Cheng and Joshua Bradley and Alex Chao and Apurva Mody and Steven Truitt and Jonathan Larson , title =. CoRR , volume =

  9. [17]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Bernal Jimenez Gutierrez and Yiheng Shu and Yu Gu and Michihiro Yasunaga and Yu Su , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  10. [18]

    IEEE/ACM Transactions on Audio, Speech, and Language Processing , year =

    Yun Luo and Zhen Yang and Fandong Meng and Yafu Li and Jie Zhou and Yue Zhang , title =. IEEE/ACM Transactions on Audio, Speech, and Language Processing , year =

  11. [19]

    Bai and J

    T. Bai and J. Fan and X. Wen and J. Kang and H. Lan and R. Zhao and P. Wu and Z. Zhang and Y. Zhong and G. Li and D. Lin , title =. arXiv preprint , year =

  12. [20]

    Liu and C

    B. Liu and C. Lyu and Z. Min , title =. Proceedings of EMNLP , year =

  13. [21]

    Zhang and Y

    L. Zhang and Y. Wang , title =. International Conference on Learning Representations (ICLR) , year =

  14. [22]

    Annual Meeting of the Association for Computational Linguistics (ACL) , year =

    Zeng, Delong and Xie, Yuexiang and Li, Yaliang and Shen, Ying , title =. Annual Meeting of the Association for Computational Linguistics (ACL) , year =

  15. [23]

    Cai and S

    D. Cai and S. Wang and C. Peng and Z. Zhang , title =. Proceedings of the International Conference on Mobile Computing and Networking (MobiCom) , year =

  16. [24]

    Li and others , title =

    J. Li and others , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  17. [27]

    McClelland and Bruce L

    James L. McClelland and Bruce L. McNaughton and Randall C. O'Reilly , title =. Psychological Review , volume =

  18. [28]

    Sun and M

    W. Sun and M. Advani and N. Spruston and A. Saxe and J. E. Fitzgerald , title =. Nature Neuroscience , volume =

  19. [29]

    Thota and D

    M. Thota and D. Yi and G. Leontidis , title =. Knowledge-Based Systems , volume =

  20. [31]

    and Zhang, Y

    Li, J. and Zhang, Y. and Yang, X. and Qu, J. and Xu, J. and Yang, S. and Ding, J. and Ngai, E. C. H. , title =. Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) , year =

  21. [32]

    and Yang, F

    Feng, L. and Yang, F. and Chen, F. and Cheng, X. and Xu, H. and Wan, Z. and Yan, M. and An, B. , title =. arXiv preprint arXiv:2601.04786 , year =

  22. [33]

    arXiv preprint arXiv:2601.21468 , year=

    Shi, Y. and Liu, S. and Yang, Y. and Mao, W. and Chen, Y. and Gu, Q. and Su, H. and Cai, X. and Wang, X. and Zhang, A. , title =. arXiv preprint arXiv:2601.21468 , year =

  23. [34]

    arXiv preprint arXiv:2603.15634 , year =

    Zhang, Zeyu and Li, Rui and Zhao, Xiaoyan and Zhang, Yang and Wang, Wenjie and Chen, Xu and Chua, Tat-Seng , title =. arXiv preprint arXiv:2603.15634 , year =

  24. [35]

    and others , title =

    Li, X. and others , title =. arXiv preprint arXiv:2602.17692 , year =

  25. [36]

    MemGPT: Towards LLMs as Operating Systems

    Packer, C. and Wooders, V. and Lin, K. and Fang, S. and Shieh, G. and Fiete, I. , title =. arXiv preprint arXiv:2310.08560 , year =

  26. [37]

    and others , title =

    Abdollahi, S. and others , title =. Proceedings of the 2026 EuroMLSys Conference , year =

  27. [38]

    According to me: Long-term personalized referential memory qa, 2026

    Mei, J. and Chen, J. and Yang, G. and Hou, X. and Li, M. and Byrne, B. , title =. arXiv preprint arXiv:2603.01990 , year =

  28. [39]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    A-MEM: Agentic memory for LLM agents , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  29. [41]

    arXiv preprint , year=

    From rag to memory: Non-parametric continual learning for large language models , author=. arXiv preprint , year=

  30. [42]

    International Conference on Learning Representations (ICLR) , year=

    Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection , author=. International Conference on Learning Representations (ICLR) , year=

  31. [43]

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year=

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , author=. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP) , year=

  32. [45]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Qwen3 embedding: Advancing text embedding and reranking through foundation models , author=. arXiv preprint arXiv:2506.05176 , year=

  33. [46]

    Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking

    Qwen3-vl-embedding and qwen3-vl-reranker: A unified framework for state-of-the-art multimodal retrieval and ranking , author=. arXiv preprint arXiv:2601.04720 , year=

  34. [47]

    Advances in Neural Information Processing Systems (NeurIPS) , year=

    Retrieval-augmented generation for knowledge-intensive NLP tasks , author=. Advances in Neural Information Processing Systems (NeurIPS) , year=

  35. [48]

    Abdollahi and 1 others

    S. Abdollahi and 1 others. 2026. https://doi.org/10.1145/3805621.3807660 Agentee: Confidential llm agent execution on edge devices . Proceedings of the 2026 EuroMLSys Conference

  36. [49]

    Shuai Bai, Yuqi Cai, Ruoyi Chen, Kai Chen, Xu Chen, Zhihao Cheng, and 1 others. 2025 a . Qwen3-vl technical report. arXiv preprint arXiv:2511.21631

  37. [50]

    Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, and 1 others. 2025 b . Qwen2.5-vl technical report. arXiv preprint arXiv:2502.13923

  38. [51]

    T. Bai, J. Fan, X. Wen, J. Kang, H. Lan, R. Zhao, P. Wu, Z. Zhang, Y. Zhong, G. Li, and D. Lin. 2025 c . Survey on ai memory: Theories, taxonomies, evaluations, and emerging trends. arXiv preprint

  39. [52]

    Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, and Chuan Shi. 2025 d . Efficient multi-task prompt tuning for recommendation. ACM Transactions on Information Systems, 43(4):1--21

  40. [53]

    Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, and 1 others. 2022. Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning (ICML), pages 2206--2240

  41. [54]

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, and 1 others. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems (NeurIPS), pages 1877--1901

  42. [55]

    D. Cai, S. Wang, C. Peng, and Z. Zhang. 2024. Recall: Empowering multimodal embedding for edge devices. In Proceedings of the International Conference on Mobile Computing and Networking (MobiCom)

  43. [56]

    Zhe Chen, Weiyun Wang, Yue Cao, Yangzhou Liu, Zhangwei Gao, and 1 others. 2024. Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling. arXiv preprint arXiv:2412.05271

  44. [57]

    Prateek Chhikara, Dhananjay Khant, Sourav Aryan, Tarun Singh, and Deepak Yadav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413

  45. [58]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From local to global: A graph rag approach to query-focused summarization. CoRR, abs/2404.16130

  46. [59]

    L. Feng, F. Yang, F. Chen, X. Cheng, H. Xu, Z. Wan, M. Yan, and B. An. 2026. https://doi.org/10.48550/arXiv.2601.04786 Agentocr: Reimagining agent history via optical self-compression . arXiv preprint arXiv:2601.04786

  47. [60]

    Y. Fu, R. Anantha, and J. Cheng. 2024. Camphor: Collaborative agents for multi-input planning and high-order reasoning on device. arXiv preprint arXiv:2410.09407

  48. [61]

    Gutierrez, Yu Shu, Weijia Qi, Shuo Zhou, and Yu Su

    Bernal J. Gutierrez, Yu Shu, Weijia Qi, Shuo Zhou, and Yu Su. 2025. From rag to memory: Non-parametric continual learning for large language models. arXiv preprint. Published at ICLR 2025 / OpenReview LWH8yn4HS2

  49. [62]

    Bernal Jimenez Gutierrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. 2024. Hipporag: Neurobiologically inspired long-term memory for large language models. In Advances in Neural Information Processing Systems (NeurIPS)

  50. [63]

    LoRA: Low-Rank Adaptation of Large Language Models

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, and 1 others. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685

  51. [64]

    Canny, and Ian Fischer

    Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John F. Canny, and Ian Fischer. 2024. A human-inspired reading agent with gist memory of very long contexts. In International Conference on Machine Learning (ICML)

  52. [65]

    Patrick Lewis, Ethan Perez, Aleksander Piktus, and 1 others. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems (NeurIPS)

  53. [66]

    J. Li, Y. Zhang, X. Yang, J. Qu, J. Xu, S. Yang, J. Ding, and E. C. H. Ngai. 2026 a . https://doi.org/10.48550/arXiv.2604.26622 Ocr-memory: Optical context retrieval for long-horizon agent memory . Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

  54. [67]

    Li and 1 others

    J. Li and 1 others. 2025. Venus: An efficient edge memory-and-retrieval system for vlm-based online video understanding. In Advances in Neural Information Processing Systems (NeurIPS)

  55. [68]

    and others , title =

    X. Li and 1 others. 2026 b . https://doi.org/10.48550/arXiv.2602.17692 Agentic unlearning: When llm agent meets machine unlearning . arXiv preprint arXiv:2602.17692

  56. [69]

    B. Liu, C. Lyu, and Z. Min. 2024. Retrieval meets reasoning: Even high-school textbook knowledge benefits multimodal reasoning. In Proceedings of EMNLP

  57. [70]

    J. Liu, Y. Sun, W. Cheng, H. Lei, Y. Chen, and 1 others. 2025. Memverse: Multimodal memory for lifelong learning agents. arXiv preprint arXiv:2512.03627

  58. [71]

    Miao Lu, Weiwei Sun, Weihua Du, Zhan Ling, Xuesong Yao, Kang Liu, and Jiecao Chen. 2025. Scaling llm multi-turn rl with end-to-end summarization-based context management. arXiv preprint arXiv:2510.06727

  59. [72]

    Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. 2025. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. IEEE/ACM Transactions on Audio, Speech, and Language Processing

  60. [73]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. Evaluating very long-term conversational memory of llm agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL), pages 13851--13870

  61. [74]

    McClelland, Bruce L

    James L. McClelland, Bruce L. McNaughton, and Randall C. O'Reilly. 1995. Why there are complementary learning systems in the hippocampus and neocortex. Psychological Review, 102(3):419--457

  62. [75]

    J. Mei, J. Chen, G. Yang, X. Hou, M. Li, and B. Byrne. 2026. https://doi.org/10.48550/arXiv.2603.01990 According to me: Long-term personalized referential memory qa . arXiv preprint arXiv:2603.01990

  63. [76]

    MemGPT: Towards LLMs as Operating Systems

    C. Packer, V. Wooders, K. Lin, S. Fang, G. Shieh, and I. Fiete. 2023. https://doi.org/10.48550/arXiv.2310.08560 Memgpt: Towards llms as operating systems . arXiv preprint arXiv:2310.08560

  64. [77]

    O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S

    Joon Sung Park, Joseph C. O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST), pages 2:1--2:22

  65. [78]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP)

  66. [79]

    Y. Shi, S. Liu, Y. Yang, W. Mao, Y. Chen, Q. Gu, H. Su, X. Cai, X. Wang, and A. Zhang. 2026. https://doi.org/10.48550/arXiv.2601.21468 Memocr: Layout-aware visual memory for efficient long-horizon reasoning . arXiv preprint arXiv:2601.21468

  67. [80]

    W. Sun, M. Advani, N. Spruston, A. Saxe, and J. E. Fitzgerald. 2023. Organizing memories for generalization in complementary learning systems. Nature Neuroscience, 26(8):1438--1448

  68. [81]

    Thota, D

    M. Thota, D. Yi, and G. Leontidis. 2023. Lleda---lifelong self-supervised domain adaptation. Knowledge-Based Systems, 279:110959

  69. [82]

    Haoran Wei, Yaofeng Sun, and Yukun Li. 2025. Deepseek-ocr: Contexts optical compression. arXiv preprint arXiv:2510.18234

  70. [83]

    B. Wu, Y. Li, Z. Zhang, Y. Wei, M. Fang, and L. Chen. 2024. Foundations and recent trends in multimodal mobile agents: A survey. arXiv preprint arXiv:2411.02006

  71. [84]

    Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, and Ziyuan Ling. 2024. On-device language models: A comprehensive review. arXiv preprint arXiv:2409.00088

  72. [85]

    Weiran Xu, Zhijian Liang, Jingbiao Mei, Hang Gao, Jie Tan, and Yi Zhang. 2025. A-mem: Agentic memory for llm agents. In Advances in Neural Information Processing Systems (NeurIPS)

  73. [86]

    Zhongkai Yu, Shengwen Liang, Tianyun Ma, Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao, Jie Zhang, Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu, Qi Guo, and Tianshi Chen. 2024. Cambricon-llm: A chiplet-based hybrid architecture for on-device inference of 70b llm. arXiv preprint arXiv:2409.15654

  74. [87]

    Delong Zeng, Yuexiang Xie, Yaliang Li, and Ying Shen. 2025. Enhancing multimodal retrieval via complementary information extraction and alignment. In Annual Meeting of the Association for Computational Linguistics (ACL)

  75. [88]

    Zhang and Y

    L. Zhang and Y. Wang. 2026. Trace: Grounding time series in context for multimodal embedding and retrieval. In International Conference on Learning Representations (ICLR)

  76. [89]

    Zeyu Zhang, Rui Li, Xiaoyan Zhao, Yang Zhang, Wenjie Wang, Xu Chen, and Tat-Seng Chua. 2026. https://doi.org/10.48550/arXiv.2603.15634 Nextmem: Towards latent factual memory for llm-based agents . arXiv preprint arXiv:2603.15634

  77. [90]

    Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, and 1 others. 2024. Llamafactory: Unified efficient fine-tuning of 100+ language models. arXiv preprint arXiv:2403.13372

  78. [91]

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. Memorybank: Enhancing large language models with long-term memory. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 19724--19731