pith. machine review for the scientific record. sign in

arxiv: 2604.07894 · v1 · submitted 2026-04-09 · 💻 cs.CL · cs.AI

Recognition: unknown

TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:45 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords long-horizon personalizationmemory evolutioncontext distillationself-learningpersonalized LLMsmemory-augmented systemstrain-inference gap
0
0 comments X

The pith

TSUBASA improves long-horizon personalization in language models by evolving memory dynamically and using self-learning with context distillation to internalize user experiences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Personalized large language models struggle to maintain coherence across extended user histories because existing memory systems either overwrite important details or incur high computational costs. TSUBASA tackles this with a dual strategy: dynamic memory evolution that updates stored information as behaviors change, paired with self-learning that distills past contexts into the model parameters. The result is claimed to surpass prior memory-augmented baselines on long-horizon tasks while using fewer tokens, addressing both the quality-efficiency tradeoff and the lack of labeled data for adaptation.

Core claim

TSUBASA is a two-pronged method that enhances memory writing through dynamic evolution of stored user information and improves memory reading through self-learning driven by a context distillation objective, allowing the model to internalize extensive user histories without external labels and thereby close the train-inference gap for long-horizon personalization.

What carries the argument

Dynamic memory evolution for updating stored experiences combined with a context distillation objective that drives self-supervised internalization of user history during reading.

If this is right

  • TSUBASA outperforms competitive memory-augmented systems such as Mem0 and Memory-R1 on long-horizon personalization benchmarks.
  • The approach achieves Pareto improvements by delivering higher fidelity personalization at a reduced token budget.
  • Effectiveness holds across the Qwen-3 model family ranging from 4B to 32B parameters.
  • Self-learning bridges the train-inference gap, enabling adaptation without additional labeled data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The memory evolution component could be tested in domains requiring ongoing tracking, such as multi-session project assistance or longitudinal health coaching.
  • Context distillation might be combined with retrieval-augmented generation to further lower token costs in very long contexts.
  • If the self-learning step generalizes, it could reduce the frequency of explicit user feedback needed to maintain personalization over time.

Load-bearing premise

Self-learning via context distillation can reliably internalize evolving user experiences and close the train-inference gap without introducing inconsistencies or requiring labeled data.

What would settle it

On the same long-horizon benchmarks and model sizes, a version of TSUBASA without the context distillation self-learning component would fail to exceed the performance of Mem0 or Memory-R1 or would require equal or greater token budgets for comparable fidelity.

Figures

Figures reproduced from arXiv: 2604.07894 by Lu Wang, Xinliang Frederick Zhang.

Figure 1
Figure 1. Figure 1: Overview of TSUBASA framework, built on two synergistic wings. Dynamic memory writing applies structured algorithmic evolution based on high-density observations distilled from raw utterances (Section 4.1). Internalized memory reading adopts the self-learning pipeline and applies context distillation objective on synthetic data (Section 4.2). During Inference , observations are retrieved for factual ground… view at source ↗
Figure 2
Figure 2. Figure 2: Quality-Efficiency tradeoff between input length and F1 metric on baselines and [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Impact of top-d truncation for approximating KL divergence on distillation quality (F1) with TSUB￾ASA variants. Best-performing config is highlighted in circle for each model size. Importantly, we find d = 1 inadequate, while increasing d beyond 5 leads to perfor￾mance degradation due to noises in long-tail distribution, establishing d = 5 at the sweet spot. See Figure A3 and Figure A4 for impacts on ROUGE… view at source ↗
read the original abstract

Personalized large language models (PLLMs) have garnered significant attention for their ability to align outputs with individual's needs and preferences. However, they still struggle with long-horizon tasks, such as tracking a user's extensive history of conversations or activities. Existing memory mechanisms often fail to capture evolving behaviors, and RAG paradigms are trapped by a quality-efficiency tradeoff. Meanwhile, parametric adaptation is bottlenecked by train-inference gap due to the scarcity of labeled data. To enhance the long-horizon capabilities of PLLMs, we introduce TSUBASA, a two-pronged approach designed to improve memory writing via dynamic memory evolution, and memory reading via self-learning with a context distillation objective to internalize user experiences. Extensive evaluations on long-horizon benchmarks using the Qwen-3 model family (4B to 32B) validate the effectiveness of TSUBASA, surpassing competitive memory-augmented systems that rely primarily on memory writing, such as Mem0 and Memory-R1. Our analyses further confirms that TSUBASA breaks the quality-efficiency barrier to achieve Pareto improvements, delivering robust, high-fidelity personalization with a reduced token budget.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces TSUBASA, a two-pronged approach for long-horizon personalization in PLLMs. It improves memory writing via dynamic memory evolution and memory reading via self-learning with a context distillation objective to internalize user experiences without labeled data. Evaluations on long-horizon benchmarks with Qwen-3 models (4B-32B) claim to surpass memory-augmented baselines such as Mem0 and Memory-R1 while achieving Pareto improvements that break the quality-efficiency tradeoff.

Significance. If the results hold under rigorous scrutiny, TSUBASA would represent a meaningful advance in personalized LLMs by closing the train-inference gap and handling evolving user histories more effectively than prior memory-writing systems, with potential for more efficient deployment in long-horizon conversational applications.

major comments (2)
  1. [Method section on self-learning and context distillation] The central claim that self-learning with context distillation internalizes evolving user experiences and closes the train-inference gap without labeled data or new inconsistencies is load-bearing. The method description provides no explicit mechanism (consistency verification, uncertainty estimation, or iterative validation) to prevent drift or compounding errors across extended sequences—the precise regime where long-horizon gains are asserted.
  2. [Abstract and Experimental Evaluation] The abstract states that extensive evaluations on long-horizon benchmarks validate effectiveness and superiority over Mem0 and Memory-R1, yet no details appear on benchmark definitions, datasets, exact implementation, baseline configurations, statistical significance, or error bars. This absence prevents assessment of whether the data actually supports the surpassing and Pareto-improvement claims.
minor comments (1)
  1. [Method] Notation for the two components (memory evolution and context distillation) could be clarified with a diagram or pseudocode to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, providing clarifications where possible and committing to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses
  1. Referee: [Method section on self-learning and context distillation] The central claim that self-learning with context distillation internalizes evolving user experiences and closes the train-inference gap without labeled data or new inconsistencies is load-bearing. The method description provides no explicit mechanism (consistency verification, uncertainty estimation, or iterative validation) to prevent drift or compounding errors across extended sequences—the precise regime where long-horizon gains are asserted.

    Authors: We agree that an explicit discussion of safeguards would strengthen the presentation. The context distillation objective functions as the primary mechanism by training the model to generate and internalize compressed, high-fidelity representations of user experiences in a self-supervised manner; this process inherently prioritizes consistent patterns over transient noise, as the distillation loss penalizes deviations from the evolving context. Our long-horizon experiments empirically support stability, but we acknowledge the value of additional exposition. We will revise the method section to include a dedicated paragraph describing the iterative nature of the distillation loop and add supporting analysis on sequence-length scaling to demonstrate the absence of measurable drift. revision: yes

  2. Referee: [Abstract and Experimental Evaluation] The abstract states that extensive evaluations on long-horizon benchmarks validate effectiveness and superiority over Mem0 and Memory-R1, yet no details appear on benchmark definitions, datasets, exact implementation, baseline configurations, statistical significance, or error bars. This absence prevents assessment of whether the data actually supports the surpassing and Pareto-improvement claims.

    Authors: We appreciate this point on reporting completeness. The experimental section of the manuscript defines the long-horizon benchmarks as multi-turn interaction traces drawn from extended user histories, specifies the Qwen-3 model variants, and describes baseline adaptations for Mem0 and Memory-R1. To ensure full transparency and allow independent verification of the Pareto improvements, we will expand the experimental section with a new subsection containing precise benchmark definitions, dataset sources and preprocessing, exact hyperparameter and implementation details for all systems, and updated results tables that include error bars and statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external benchmarks, not self-referential definitions or fitted predictions.

full rationale

The paper introduces TSUBASA as a two-component system (dynamic memory evolution for writing; self-learning via context distillation for reading) and validates it through empirical evaluations on long-horizon benchmarks using Qwen-3 models. No equations, derivations, or parameter-fitting steps are described that would reduce predictions to inputs by construction. Central claims of surpassing Mem0 and Memory-R1 and breaking the quality-efficiency barrier are supported by direct comparisons to independent external systems rather than self-citations or renamed patterns. Any references to prior memory work function as background, not load-bearing justifications for uniqueness or correctness. The approach is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities. The approach builds on standard LLM memory and self-learning concepts without detailing new postulated components or fitted values.

pith-pipeline@v0.9.0 · 5500 in / 1097 out tokens · 31394 ms · 2026-05-10T16:45:22.274266+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

88 extracted references · 61 canonical work pages · 16 internal anchors

  1. [1]

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, and 1 others. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774

  2. [2]

    Chris Alberti, Daniel Andor, Emily Pitler, Jacob Devlin, and Michael Collins. 2019. https://doi.org/10.18653/v1/P19-1620 Synthetic QA corpora generation with roundtrip consistency . In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6168--6173, Florence, Italy. Association for Computational Linguistics

  3. [3]

    Giuseppe Amato and Umberto Straccia. 1999. https://doi.org/10.1007/3-540-48155-9\_13 User profile modeling and applications to digital libraries . In Research and Advanced Technology for Digital Libraries, Third European Conference, ECDL'99, Paris, France, September 22-24, 1999, Proceedings, volume 1696 of Lecture Notes in Computer Science, pages 184--197...

  4. [4]

    R. C. Atkinson and R. M. Shiffrin. 1968. Human memory: A proposed system and its control processes. In K. W. Spence and J. T. Spence, editors, The Psychology of Learning and Motivation, volume 2, pages 89--195. Academic Press, New York

  5. [5]

    Qiguang Chen, Libo Qin, Jinhao Liu, Dengyun Peng, Jiannan Guan, Peng Wang, Mengkang Hu, Yuhang Zhou, Te Gao, and Wanxiang Che. 2025. Towards reasoning era: A survey of long chain-of-thought for reasoning large language models. arXiv preprint arXiv:2503.09567

  6. [6]

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. 2025. Mem0: Building production-ready ai agents with scalable long-term memory. arXiv preprint arXiv:2504.19413

  7. [7]

    Eunbi Choi, Yongrae Jo, Joel Jang, Joonwon Jang, and Minjoon Seo. 2023. https://doi.org/10.18653/v1/2023.findings-acl.533 Fixed input parameterization for efficient prompting . In Findings of the Association for Computational Linguistics: ACL 2023, pages 8428--8441, Toronto, Canada. Association for Computational Linguistics

  8. [8]

    Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, and 1 others. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities. arXiv preprint arXiv:2507.06261

  9. [9]

    Thomas M Cover. 1999. Elements of information theory. John Wiley & Sons

  10. [10]

    Naihao Deng, Xinliang Zhang, Siyang Liu, Winston Wu, Lu Wang, and Rada Mihalcea. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.832 You are what you annotate: Towards better models through annotator representations . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12475--12498, Singapore. Association for Computationa...

  11. [11]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. https://doi.org/10.18653/v1/N19-1423 BERT : Pre-training of deep bidirectional transformers for language understanding . In Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long a...

  12. [12]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. https://arxiv.org/abs/2401.08281 The faiss library

  13. [13]

    Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sebastien Montella, Mirella Lapata, Kam-Fai Wong, and Jeff Z Pan. 2025. Rethinking memory in ai: Taxonomy, operations, topics, and future directions. arXiv e-prints, pages arXiv--2505

  14. [14]

    Mehmet Samet Duran and Tevfik Aytekin. 2025. Beyond one-size-fits-all summarization: Customizing summaries for diverse users. arXiv preprint arXiv:2503.10675

  15. [15]

    Aamir Fareed, Saima Hassan, Samir Brahim Belhaouari, and Zahid Halim. 2023. https://doi.org/10.1016/j.mlwa.2023.100495 A collaborative filtering recommendation framework utilizing social networks . Machine Learning with Applications, 14:100495

  16. [16]

    Gerhard Fischer. 2001. User modeling in human--computer interaction. User modeling and user-adapted interaction, 11(1):65--86

  17. [17]

    Rui Gao, Bibo Hao, Shuotian Bai, Lin Li, Ang Li, and Tingshao Zhu. 2013. https://doi.org/10.1145/2507157.2507219 Improving user profile with personality traits predicted from social media content . In Seventh ACM Conference on Recommender Systems, RecSys '13, Hong Kong, China, October 12-16, 2013 , pages 355--358. ACM

  18. [18]

    Zhou, and Huahai Yang

    Liang Gou, Michelle X. Zhou, and Huahai Yang. 2014. https://doi.org/10.1145/2556288.2557398 Knowme and shareme: understanding automatically discovered personality traits from social media and user sharing preferences . In CHI Conference on Human Factors in Computing Systems, CHI'14, Toronto, ON, Canada - April 26 - May 01, 2014 , pages 955--964. ACM

  19. [19]

    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. In Proceedings of the 26th international conference on world wide web, pages 173--182

  20. [20]

    Stephen J Hoch and George F Loewenstein. 1991. Time-inconsistent preferences and consumer self-control. Journal of consumer research, 17(4):492--507

  21. [21]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, and 1 others. 2022. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3

  22. [22]

    Dana Hughes, Akshat Agarwal, Yue Guo, and Katia Sycara. 2020. Inferring non-stationary human preferences for human-agent teams. In 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), pages 1178--1185. IEEE

  23. [23]

    Meng Jiang, Peng Cui, Fei Wang, Wenwu Zhu, and Shiqiang Yang. 2014. https://doi.org/10.1109/TKDE.2014.2300487 Scalable recommendation with social contextual information . IEEE Trans. Knowl. Data Eng. , 26(11):2789--2802

  24. [24]

    Chi, and Derek Zhiyuan Cheng

    Wang - Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed H. Chi, and Derek Zhiyuan Cheng. 2023. https://doi.org/10.48550/ARXIV.2305.06474 Do llms understand user preferences? evaluating llms on user rating prediction . CoRR, abs/2305.06474

  25. [25]

    Jieun Kim, Ahreum Lee, and Hokyoung Ryu. 2013. Personality and its effects on learning performance: Design guidelines for an adaptive e-learning system based on a user model. International Journal of Industrial Ergonomics, 43(5):450--461

  26. [26]

    Sangyeop Kim, Yohan Lee, Sanghwa Kim, Hyunjong Kim, and Sungzoon Cho. 2025. Pre-storage reasoning for episodic memory: Shifting inference burden to memory for personalized dialogue. arXiv preprint arXiv:2509.10852

  27. [27]

    Computer 42(8):30--37

    Yehuda Koren, Robert M. Bell, and Chris Volinsky. 2009. https://doi.org/10.1109/MC.2009.263 Matrix factorization techniques for recommender systems . Computer, 42(8):30--37

  28. [28]

    Jaehyeok Lee, Keisuke Sakaguchi, and JinYeong Bak. 2025. https://doi.org/10.18653/v1/2025.naacl-long.528 Self-training meets consistency: Improving LLM s' reasoning with consistency-driven rationale evaluation . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Tec...

  29. [29]

    u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \"u ttler, Mike Lewis, Wen-tau Yih, Tim Rockt \"a schel, and 1 others. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems, 33:9459--9474

  30. [30]

    Minchong Li, Feng Zhou, and Xiaohui Song. 2025. https://aclanthology.org/2025.coling-main.78/ B i LD : Bi-directional logits difference loss for large language model distillation . In Proceedings of the 31st International Conference on Computational Linguistics, pages 1168--1182, Abu Dhabi, UAE. Association for Computational Linguistics

  31. [31]

    Jiahong Liu, Zexuan Qiu, Zhongyang Li, Quanyu Dai, Wenhao Yu, Jieming Zhu, Minda Hu, Menglin Yang, Tat-Seng Chua, and Irwin King. 2025. A survey of personalized large language models: Progress and future directions. arXiv preprint arXiv:2502.11528

  32. [32]

    Junling Liu, Chao Liu, Renjie Lv, Kang Zhou, and Yan Zhang. 2023. https://doi.org/10.48550/ARXIV.2304.10149 Is chatgpt a good recommender? A preliminary study . CoRR, abs/2304.10149

  33. [33]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. https://doi.org/10.1162/tacl_a_00638 Lost in the middle: How language models use long contexts . Transactions of the Association for Computational Linguistics, 12:157--173

  34. [34]

    Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin, Kaixin Li, Chuyi Kong, Ruichao Yang, and Jing Ma. 2026. From storage to experience: A survey on the evolution of llm agent memory mechanisms

  35. [35]

    Aman Madaan, Niket Tandon, Peter Clark, and Yiming Yang. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.183 Memory-assisted prompt editing to improve GPT -3 after deployment . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2833--2861, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

  36. [36]

    Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, and Maartje ter Hoeve. 2024. https://doi.org/10.48550/ARXIV.2411.13405 On the way to LLM personalization: Learning to remember user conversations . CoRR, abs/2411.13405

  37. [37]

    Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, and Maartje Ter Hoeve. 2025. https://doi.org/10.18653/v1/2025.l2m2-1.5 On the way to LLM personalization: Learning to remember user conversations . In Proceedings of the First Workshop on Large Language Model Memorization (L2M2), pages 61--77, Vienna, Austria. Association for Computational Linguistics

  38. [38]

    Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. https://doi.org/10.18653/v1/2024.acl-long.747 Evaluating very long-term conversational memory of LLM agents . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13851--13870, Bangkok...

  39. [39]

    Sheshera Mysore, Zhuoran Lu, Mengting Wan, Longqi Yang, Bahareh Sarrafzadeh, Steve Menezes, Tina Baghaee, Emmanuel Barajas Gonzalez, Jennifer Neville, and Tara Safavi. 2024. https://doi.org/10.18653/v1/2024.customnlp4u-1.16 Pearl: Personalizing large language model writing assistants with generation-calibrated retrievers . In Proceedings of the 1st Worksh...

  40. [40]

    Jiayan Nan, Wenquan Ma, Wenlong Wu, and Yize Chen. 2025. Nemori: Self-organizing agent memory inspired by cognitive science. arXiv preprint arXiv:2508.03341

  41. [41]

    Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pages 1--22

  42. [42]

    Petrov and Craig Macdonald

    Aleksandr V. Petrov and Craig Macdonald. 2023. https://doi.org/10.48550/ARXIV.2306.11114 Generative sequential recommendation with gptrec . CoRR, abs/2306.11114

  43. [43]

    Minh Pham, Minsu Cho, Ameya Joshi, and Chinmay Hegde. 2022. Revisiting self-distillation. arXiv preprint arXiv:2206.08491

  44. [44]

    Bhawna Piryani, Abdelrahman Abdullah, Jamshid Mozafari, Avishek Anand, and Adam Jatowt. 2025. It's high time: A survey of temporal information retrieval and question answering. arXiv e-prints, pages arXiv--2505

  45. [45]

    Erasmo Purificato, Ludovico Boratto, and Ernesto William De Luca. 2024. https://doi.org/10.48550/ARXIV.2402.09660 User modeling and user profiling: A comprehensive survey . CoRR, abs/2402.09660

  46. [46]

    Ruiyang Qin, Jun Xia, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Peipei Zhou, Jingtong Hu, and Yiyu Shi. 2024. Enabling on-device large language model personalization with self-supervised data selection and synthesis. In Proceedings of the 61st ACM/IEEE design automation conference, pages 1--6

  47. [47]

    Yilun Qiu, Xiaoyan Zhao, Yang Zhang, Yimeng Bai, Wenjie Wang, Hong Cheng, Fuli Feng, and Tat-Seng Chua. 2025. https://doi.org/10.18653/v1/2025.findings-acl.1095 Measuring what makes you unique: Difference-aware user modeling for enhancing LLM personalization . In Findings of the Association for Computational Linguistics: ACL 2025, pages 21258--21277, Vien...

  48. [48]

    Zhaopeng Qiu, Xian Wu, Jingyue Gao, and Wei Fan. 2021. https://doi.org/10.1609/AAAI.V35I5.16557 U-BERT: pre-training user representations for improved recommendation . In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Ed...

  49. [49]

    Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. 2025. Zep: a temporal knowledge graph architecture for agent memory. arXiv preprint arXiv:2501.13956

  50. [50]

    Christopher Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, and Abhinav Sethy. 2023. https://doi.org/10.48550/ARXIV.2310.20081 Integrating summarization and retrieval for enhanced personalization via large language models . CoRR, abs/2310.20081

  51. [51]

    Evan F Risko and Sam J Gilbert. 2016. Cognitive offloading. Trends in cognitive sciences, 20(9):676--688

  52. [52]

    The probabilistic relevance framework: Bm25 and beyond

    Stephen E. Robertson and Hugo Zaragoza. 2009. https://doi.org/10.1561/1500000019 The probabilistic relevance framework: BM25 and beyond . Found. Trends Inf. Retr., 3(4):333--389

  53. [53]

    Rana Salama, Jason Cai, Michelle Yuan, Anna Currey, Monica Sunkara, Yi Zhang, and Yassine Benajiba. 2025. https://doi.org/10.18653/v1/2025.emnlp-main.1683 M em I nsight: Autonomous memory augmentation for LLM agents . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33136--33152, Suzhou, China. Association f...

  54. [54]

    Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2024. https://doi.org/10.18653/v1/2024.acl-long.399 L a MP : When large language models meet personalization . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7370--7392, Bangkok, Thailand. Association for Computa...

  55. [55]

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347

  56. [56]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, and 1 others. 2024. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300

  57. [57]

    Aaditya Singh, Adam Fry, Adam Perelman, Adam Tart, Adi Ganesh, Ahmed El-Kishky, Aidan McLaughlin, Aiden Low, AJ Ostrow, Akhila Ananthram, and 1 others. 2025. Openai gpt-5 system card. arXiv preprint arXiv:2601.03267

  58. [58]

    Charlie Snell, Dan Klein, and Ruiqi Zhong. 2022. Learning by distilling context. arXiv preprint arXiv:2209.15189

  59. [59]

    John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science, 12(2):257--285

  60. [60]

    Haoran Tan, Zeyu Zhang, Chen Ma, Xu Chen, Quanyu Dai, and Zhenhua Dong. 2025. https://doi.org/10.18653/v1/2025.findings-acl.989 M em B ench: Towards more comprehensive evaluation on the memory of LLM -based agents . In Findings of the Association for Computational Linguistics: ACL 2025, pages 19336--19352, Vienna, Austria. Association for Computational Li...

  61. [61]

    Qingyu Tan, Hwee Tou Ng, and Lidong Bing. 2023. https://doi.org/10.18653/v1/2023.acl-long.828 Towards benchmarking and improving the temporal reasoning capability of large language models . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 14820--14835, Toronto, Canada. Association fo...

  62. [62]

    Zhaoxuan Tan, Zheyuan Liu, and Meng Jiang. 2024 a . https://doi.org/10.18653/v1/2024.emnlp-main.371 Personalized pieces: Efficient personalized large language models through collaborative efforts . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6459--6475, Miami, Florida, USA. Association for Computational...

  63. [63]

    Zhaoxuan Tan, Qingkai Zeng, Yijun Tian, Zheyuan Liu, Bing Yin, and Meng Jiang. 2024 b . https://doi.org/10.18653/v1/2024.emnlp-main.372 Democratizing large language models via personalized parameter-efficient fine-tuning . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6476--6491, Miami, Florida, USA. Asso...

  64. [64]

    Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, and Mark Gerstein. 2024. https://doi.org/10.18653/v1/2024.findings-acl.33 M ed A gents: Large language models as collaborators for zero-shot medical reasoning . In Findings of the Association for Computational Linguistics: ACL 2024, pages 599--621, Bangkok, Thailan...

  65. [65]

    Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.969 Two tales of persona in LLM s: A survey of role-playing and personalization . In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16612--16631, Miami, Florida, USA. Ass...

  66. [66]

    Endel Tulving. 1985. How many memory systems are there? American psychologist, 40(4):385

  67. [67]

    Endel Tulving and 1 others. 1972. Episodic and semantic memory. Organization of memory, 1(381-403):1

  68. [68]

    Gomez, Lukasz Kaiser, and Illia Polosukhin

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html Attention is all you need . In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Proces...

  69. [69]

    Haoming Wang, Boyuan Yang, Xiangyu Yin, and Wei Gao. 2025 a . Never start from scratch: Expediting on-device llm personalization via explainable model selection. In Proceedings of the 23rd Annual International Conference on Mobile Systems, Applications and Services, pages 154--168

  70. [70]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, and 1 others. 2024. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):186345

  71. [71]

    Yu Wang, Ryuichi Takanobu, Zhiqi Liang, Yuzhen Mao, Yuanzhe Hu, Julian McAuley, and Xiaojian Wu. 2025 b . Mem- \ alpha \ : Learning memory construction via reinforcement learning. arXiv preprint arXiv:2509.25911

  72. [72]

    Tianxin Wei, Noveen Sachdeva, Benjamin Coleman, Zhankui He, Yuanchen Bei, Xuying Ning, Mengting Ai, Yunzhe Li, Jingrui He, Ed H Chi, and 1 others. 2025. Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory. arXiv preprint arXiv:2511.20857

  73. [73]

    Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai - Wei Chang, and Dong Yu. 2025 a . https://openreview.net/forum?id=pZiyCaVuti Longmemeval: Benchmarking chat assistants on long-term interactive memory . In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net

  74. [74]

    Yaxiong Wu, Sheng Liang, Chen Zhang, Yichao Wang, Yongyue Zhang, Huifeng Guo, Ruiming Tang, and Yong Liu. 2025 b . From human memory to ai memory: A survey on memory mechanisms in the era of llms. arXiv preprint arXiv:2504.15965

  75. [75]

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. 2025. A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110

  76. [76]

    Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

    Sikuan Yan, Xiufeng Yang, Zuchao Huang, Ercong Nie, Zifeng Ding, Zonggen Li, Xiaowen Ma, Jinhe Bi, Kristian Kersting, Jeff Z. Pan, Hinrich Schuetze, Volker Tresp, and Yunpu Ma. 2025. Memory-R1 : Enhancing large language model agents to manage and utilize memories via reinforcement learning. arXiv preprint arXiv:2508.19828

  77. [77]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, and 1 others. 2025. Qwen3 technical report. arXiv preprint arXiv:2505.09388

  78. [78]

    An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, and 22 others. 2024. https://doi.org/10.48550/ARXIV.2412.15115 Qwen2.5 technical report . CoRR, abs/2412.15115

  79. [79]

    Kai Zhang, Yangyang Kang, Fubang Zhao, and Xiaozhong Liu. 2024 a . https://doi.org/10.18653/v1/2024.naacl-long.132 LLM -based medical assistant personalization with short- and long-term memory coordination . In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vo...

  80. [80]

    Kai Zhang, Lizhi Qing, Yangyang Kang, and Xiaozhong Liu. 2024 b . https://doi.org/10.48550/ARXIV.2404.03565 Personalized LLM response generation with parameterized memory injection . CoRR, abs/2404.03565

Showing first 80 references.