pith. sign in

arxiv: 2605.30260 · v1 · pith:IGSIR5EKnew · submitted 2026-05-28 · 💻 cs.CL · cs.AI· cs.CV· cs.LG

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Pith reviewed 2026-06-29 07:21 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CVcs.LG
keywords LoRAparametric memoryfine-tuningpower lawphase transitionLLMmemory capacityMemFT
0
0 comments X

The pith

LoRA fine-tuning follows a power law where loss reduction depends on effective parameters and sequence length, with verbatim recall guaranteed once token prediction exceeds 0.5 probability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats LoRA as a controlled probe inside the latent space to measure exact parametric memory capacity in LLMs. It establishes the Parametric Memory Law, a power-law relation that predicts loss reduction from the count of effective parameters and the length of the sequence being memorized. Token-level analysis uncovers a sharp phase transition: any token whose model-assigned probability rises above 0.5 will be recalled verbatim under greedy decoding. These observations motivate MemFT, a training procedure that reallocates the optimization budget to tokens still below the threshold.

Core claim

The authors demonstrate that the reduction in loss ΔL during LoRA fine-tuning follows a robust power law determined by the number of effective parameters and the sequence length. Fine-grained token analysis shows a deterministic phase transition at p > 0.5, which is sufficient for verbatim recall under greedy decoding. This insight enables a new optimization strategy called MemFT that redistributes training budget to sub-threshold tokens.

What carries the argument

The Parametric Memory Law, a power-law relationship between loss reduction ΔL, effective parameters, and sequence length.

Load-bearing premise

The observed power law and the p > 0.5 phase transition will continue to hold for models, datasets, and LoRA configurations outside the specific controlled experiments performed.

What would settle it

Repeat the latent-space probe experiments on a different model family or dataset and verify whether the fitted power-law exponent stays consistent and whether every token with probability above 0.5 is still recalled exactly under greedy decoding.

Figures

Figures reproduced from arXiv: 2605.30260 by Benglei Cui, Haiwen Hong, Hui Xue, Linsong Yu, Longtao Huang, Ningyu Zhang, Ziwen Xu.

Figure 1
Figure 1. Figure 1: LoRA as a pluggable memory unit in the LLM’s latent space. The LoRA module (rank r) en￾codes contextual knowledge into the residual stream at layer k, enabling faithful recall of memorized text. The Parametric Memory Law quantifies the capacity￾parameter trade-off. Non-parametric methods address this challenge by providing external context during inference. Specifically, approaches such as in-context learn… view at source ↗
Figure 2
Figure 2. Figure 2: Empirical validation of the Parametric Memory Capacity Law (LoRA on Qwen3-8B). (a) ∆L exhibits approximate log-linear decay with respect to rank r and length ℓ, forming a nearly planar structure in log-space; (b) The scatter plot compares predicted ∆L from Eq. (6) against true values, showing high fidelity (R2 = 0.996); (c) Heatmaps plot the final loss and token-level accuracy (correct tokens / total lengt… view at source ↗
Figure 3
Figure 3. Figure 3: Microscopic origin of the Loss-Accuracy Paradox. Results are based on Qwen3-8B trained on the Random dataset. (a) Sparse stubborn positions: A small set of indices where target probabilities persistently remain p < 0.5, resisting improvement even as LoRA rank increases. (b) Correlation with decoding failure: The earliest stubborn position tightly bounds the first failure index i ∗ (Spearman ρ = 0.908, n = … view at source ↗
Figure 4
Figure 4. Figure 4: Training convergence of Qwen3-8B on the Random / Long-Context Memorization Stress Test. Each [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Training convergence of Llama3.1-8B on the Random / Long-Context Memorization Stress Test. The [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Training convergence of Llama3.1-8B on the PhoneBook benchmark. Each subplot corresponds to a [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Exact-match accuracy of Qwen3-8B on the Random / Long-Context Memorization Stress Test. Each [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Exact-match accuracy of Llama3.1-8B on the Random / Long-Context Memorization Stress Test. The [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Exact-match accuracy on the PhoneBook benchmark for Qwen3-8B and Llama3.1-8B. The upper panels [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Per-position probability grid for the Long-Context Memoriza tion Stress Test Random 100% scenario with r ∈ {8, 10, 12, 14, 16}. This rank range aligns with the LongBench-mixed scenarios for direct comparison [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Per-position probability grid for the Random (100%) scenario with r ∈ {48, 64, 128, 256, 512} [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Per-position probability grid for the Long-Context Memoriza tion Stress Test Random 20% scenario. With 80% semantically coherent tokens from LongBench, the model memorizes more easily and stubborn positions appear only at longer lengths [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Per-position probability grid for the Long-Context Memoriza tion Stress Test Random 60% scenario. With 40% semantically coherent tokens, the difficulty is intermediate between the Random 100% and Random 20% settings, and stubborn positions emerge at shorter lengths compared to [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
read the original abstract

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving the quantitative capacity limits and underlying dynamics of exact parametric memory largely unexplored. To bridge this gap, we employ LoRA as a controlled memory capacity probe within the latent space to systematically quantify exact parametric memory. We introduce the Parametric Memory Law, a robust power law linking loss reduction Delta L to effective parameters and sequence length. At the token level, fine-grained analysis reveals a deterministic phase transition, demonstrating that a prediction probability of p > 0.5 constitutes a sufficient condition for verbatim recall under greedy decoding. Driven by these insights, we introduce MemFT, a threshold-guided optimization strategy that dynamically redistributes the training budget toward sub-threshold tokens. Empirical evaluations demonstrate that MemFT can enhance memory fidelity and efficiency. Code will be released at https://github.com/zjunlp/ParametricMemoryLaw.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper employs LoRA as a latent-space probe to quantify exact parametric memory in LLMs. It introduces the Parametric Memory Law as a power law relating loss reduction ΔL to effective parameters and sequence length. At the token level, it identifies a deterministic phase transition where p > 0.5 suffices for verbatim recall under greedy decoding, and proposes the MemFT threshold-guided optimization to improve fidelity and efficiency.

Significance. If the power law generalizes and the phase transition provides non-trivial insight, the work could supply a quantitative basis for memory capacity in fine-tuning and motivate more efficient adaptation methods. The empirical nature of the law and the triviality of the p > 0.5 condition, however, limit its current significance absent broader validation.

major comments (3)
  1. [Abstract] Abstract: the sufficiency of p > 0.5 for verbatim recall under greedy decoding follows directly from the definition of argmax and adds no new information beyond the decoding rule itself; this undermines the claim of a 'deterministic phase transition' as a substantive finding.
  2. [Abstract] Abstract: the Parametric Memory Law is asserted as a 'robust power law' without derivation from loss-landscape or capacity arguments and without reported out-of-distribution or cross-model tests; the robustness claim therefore rests entirely on fits within the narrow LoRA probe regime described.
  3. [Abstract] Abstract: MemFT is motivated by the phase-transition insight, yet because that insight reduces to the standard greedy rule, the strategy's claimed improvements require independent verification that is not shown to hold outside the specific models, ranks, and datasets used in the controlled probes.
minor comments (1)
  1. [Abstract] The abstract states that code will be released but provides no details on how the power-law parameters or phase-transition thresholds were obtained, hindering immediate reproducibility assessment.

Simulated Author's Rebuttal

3 responses · 2 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below with clarifications and indicate proposed revisions to better align the abstract with the empirical scope of our findings.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the sufficiency of p > 0.5 for verbatim recall under greedy decoding follows directly from the definition of argmax and adds no new information beyond the decoding rule itself; this undermines the claim of a 'deterministic phase transition' as a substantive finding.

    Authors: We agree that p > 0.5 follows directly from the argmax operation in greedy decoding. The contribution in our work is the empirical demonstration, via fine-grained token-level analysis during LoRA-based memory insertion, that the per-token prediction probability crosses this threshold at the point of verbatim memorization. We will revise the abstract to frame this as an observed transition in the learning trajectory rather than a novel theoretical phase transition. revision: partial

  2. Referee: [Abstract] Abstract: the Parametric Memory Law is asserted as a 'robust power law' without derivation from loss-landscape or capacity arguments and without reported out-of-distribution or cross-model tests; the robustness claim therefore rests entirely on fits within the narrow LoRA probe regime described.

    Authors: The Parametric Memory Law is presented as an empirical power-law relationship observed consistently across controlled LoRA probing experiments varying effective parameters and sequence lengths. We do not claim a first-principles derivation. We will revise the abstract to qualify the robustness claim as holding within the reported experimental regime and add a limitations discussion noting the need for further validation. revision: partial

  3. Referee: [Abstract] Abstract: MemFT is motivated by the phase-transition insight, yet because that insight reduces to the standard greedy rule, the strategy's claimed improvements require independent verification that is not shown to hold outside the specific models, ranks, and datasets used in the controlled probes.

    Authors: MemFT is a threshold-guided training strategy whose performance gains are shown through direct empirical comparisons in our evaluations. We will revise the abstract to decouple the description of MemFT from the phase-transition phrasing and to highlight that its benefits are demonstrated empirically within the evaluated settings. revision: partial

standing simulated objections not resolved
  • Absence of a theoretical derivation of the Parametric Memory Law from loss-landscape or capacity arguments.
  • Lack of out-of-distribution or additional cross-model validation beyond the LoRA probe experiments reported.

Circularity Check

1 steps flagged

p>0.5 'phase transition' reduces to argmax definition; power law presented as introduced without shown first-principles derivation

specific steps
  1. other [Abstract]
    "At the token level, fine-grained analysis reveals a deterministic phase transition, demonstrating that a prediction probability of p > 0.5 constitutes a sufficient condition for verbatim recall under greedy decoding."

    The sufficiency of p > 0.5 for selection under greedy decoding is a direct logical consequence of the argmax operation (the token with probability >0.5 must be the maximum), so the claimed 'deterministic phase transition' and 'demonstrating' add no new information beyond restating the decoding rule itself.

full rationale

The abstract presents the Parametric Memory Law as introduced from empirical observation and the p>0.5 condition as a revealed deterministic transition. No equations or derivation chain appear in the provided text that would allow reduction to fitted inputs or self-citations. The p>0.5 sufficiency follows directly from the definition of greedy decoding rather than from any model-specific analysis, but this is a single logical observation rather than a load-bearing derivation chain that collapses the central result. No self-citation, ansatz smuggling, or uniqueness theorem is invoked. The paper remains self-contained as an empirical report with the noted tautology on one sub-claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based solely on abstract; full text unavailable so free parameters, axioms, and invented entities cannot be audited. The power law itself likely introduces at least one fitted exponent and the 0.5 threshold is presented as discovered rather than derived from first principles.

pith-pipeline@v0.9.1-grok · 5737 in / 1227 out tokens · 26187 ms · 2026-06-29T07:21:52.021952+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Memory Depth, Not Memory Access: Selective Parametric Consolidation for Long-Running Language Agents

    cs.AI 2026-06 unverdicted novelty 6.0

    EVAF, a surprise- and valence-gated LoRA mechanism, provides memory depth for goal persistence in language agents via the loop-drift protocol, complementary to retrieval.

  2. Yuvion LLM: An Adversarially-Aware Large Language Model for Content And AI Safety

    cs.CL 2026-06 unverdicted novelty 5.0

    Yuvion LLM applies adversarially aware training and introduces the YLRE benchmark set, claiming superior safety robustness over larger models on multiple tasks.

Reference graph

Works this paper leans on

48 extracted references · 30 canonical work pages · cited by 2 Pith papers · 10 internal anchors

  1. [1]

    M. H. I. Abdalla, Zhipin Wang, Christian Frey, Steffen Eger, and Josif Grabocka. 2025. https://doi.org/10.48550/ARXIV.2510.19733 Zhyper: Factorized hypernetworks for conditioned LLM fine-tuning . CoRR, abs/2510.19733

  2. [2]

    Seungju Back, Dongwoo Lee, Naun Kang, Taehee Lee, S. K. Hong, Youngjune Gwon, and Sungjin Ahn. 2026. https://doi.org/10.48550/ARXIV.2603.01097 Understanding lora as knowledge memory: An empirical analysis . CoRR, abs/2603.01097

  3. [3]

    Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. https://doi.org/10.18653/V1/2024.ACL-LONG.172 Longbench: A bilingual, multitask benchmark for long context understanding . In Proceedings of the 62nd Annual Meeting of the Association for Com...

  4. [4]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert - Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litw...

  5. [5]

    Rujikorn Charakorn, Edoardo Cetin, Yujin Tang, and Robert Tjarko Lange. 2025. https://proceedings.mlr.press/v267/charakorn25a.html Text-to-lora: Instant transformer adaption . In Forty-second International Conference on Machine Learning, ICML 2025, Vancouver, BC, Canada, July 13-19, 2025 , Proceedings of Machine Learning Research. PMLR / OpenReview.net

  6. [6]

    Rujikorn Charakorn, Edoardo Cetin, Shinnosuke Uesaka, and Robert Tjarko Lange. 2026. https://doi.org/10.48550/ARXIV.2602.15902 Doc-to-lora: Learning to instantly internalize contexts . CoRR, abs/2602.15902

  7. [7]

    Tong Chen, Hao Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, and Hao Cheng. 2025. https://openreview.net/forum?id=bc3sUsS6ck Generative adapter: Contextualizing language models in parameters with A single forward pass . In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, Apri...

  8. [8]

    Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. 2025. https://doi.org/10.3233/FAIA251160 Mem0: Building production-ready AI agents with scalable long-term memory . In ECAI 2025 - 28th European Conference on Artificial Intelligence, 25-30 October 2025, Bologna, Italy - Including 14th Conference on Prestigious Applications of I...

  9. [9]

    Gr \' e goire Del \' e tang, Anian Ruoss, Paul - Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau - Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, and Joel Veness. 2024. https://openreview.net/forum?id=jznbgiynus Language modeling is compression . In The Twelfth International Conference on Learning ...

  10. [10]

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. https://doi.org/10.18653/V1/2024.EMNLP-MAIN.64 A survey on in-context learning . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, Nov...

  11. [11]

    Jizhan Fang, Xinle Deng, Haoming Xu, Ziyan Jiang, Yuqi Tang, Ziwen Xu, Shumin Deng, Yunzhi Yao, Mengru Wang, Shuofei Qiao, Huajun Chen, and Ningyu Zhang. 2025. https://doi.org/10.48550/ARXIV.2510.18866 Lightmem: Lightweight and efficient memory-augmented generation . CoRR, abs/2510.18866

  12. [12]

    Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. 2023. https://doi.org/10.48550/ARXIV.2312.10997 Retrieval-augmented generation for large language models: A survey . CoRR, abs/2312.10997

  13. [13]

    Human-inspired perspectives: A survey on ai long-term memory,

    Zihong He, Weizhe Lin, Hao Zheng, Fan Zhang, Matt W. Jones, Laurence Aitchison, Xuhai Xu, Miao Liu, Per Ola Kristensson, and Junxiao Shen. 2024. https://doi.org/10.48550/ARXIV.2411.00489 Human-inspired perspectives: A survey on AI long-term memory . CoRR, abs/2411.00489

  14. [14]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen - Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. https://openreview.net/forum?id=nZeVKeeFYf9 Lora: Low-rank adaptation of large language models . In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022 . OpenReview.net

  15. [15]

    Kakade, and Eran Malach

    Samy Jelassi, David Brandfonbrener, Sham M. Kakade, and Eran Malach. 2024. https://proceedings.mlr.press/v235/jelassi24a.html Repeat after me: Transformers are better than state space models at copying . In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 , Proceedings of Machine Learning Research, pag...

  16. [16]

    Josip Jukic, Martin Tutek, and Jan Snajder. 2025. https://doi.org/10.48550/ARXIV.2509.22158 Context parametrization with compositional adapters . CoRR, abs/2509.22158

  17. [17]

    Jiazheng Kang, Mingming Ji, Zhe Zhao, and Ting Bai. 2025. https://doi.org/10.18653/V1/2025.EMNLP-MAIN.1318 Memory OS of AI agent . In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, EMNLP 2025, Suzhou, China, November 4-9, 2025 , pages 25961--25970. Association for Computational Linguistics

  18. [18]

    Sorokin, and Mikhail Burtsev

    Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Y. Sorokin, and Mikhail Burtsev. 2024. http://papers.nips.cc/paper\_files/paper/2024/hash/c0d62e70dbc659cc9bd44cbcf1cb652f-Abstract-Datasets\_and\_Benchmarks\_Track.html Babilong: Testing the limits of llms with long context reasoning-in-a-haystack . In Advances in Neural Infor...

  19. [19]

    Jingdi Lei, Di Zhang, Junxian Li, Weida Wang, Kaixuan Fan, Xiang Liu, Qihan Liu, Xiaoteng Ma, Baian Chen, and Soujanya Poria. 2026. https://api.semanticscholar.org/CorpusID:288259118 -mem: Efficient online memory for large language models

  20. [20]

    u ttler, Mike Lewis, Wen - tau Yih, Tim Rockt \

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K \" u ttler, Mike Lewis, Wen - tau Yih, Tim Rockt \" a schel, Sebastian Riedel, and Douwe Kiela. 2020. https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html Retrieval-augmented generation for knowledge-intensive ...

  21. [21]

    Jiakai Li, Rongzheng Wang, Yizhuo Ma, Shuang Liang, Guangchun Luo, and Ke Qin. 2026. Dsas: A universal plug-and-play framework for attention optimization in multi-document question answering. Advances in Neural Information Processing Systems, 38:174538--174564

  22. [22]

    Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi - Qin John Xu, and Feiyu Xiong. 2025. https://doi.org/10.48550/ARXIV.2505.22101 Memos: An operating system fo...

  23. [23]

    Bronstein, Yang You, Zhangyang Wang, and Kai Wang

    Zhiyuan Liang, Dongwen Tang, Yuhao Zhou, Xuanlei Zhao, Mingjia Shi, Wangbo Zhao, Zekai Li, Peihao Wang, Konstantin Sch \" u rholt, Damian Borth, Michael M. Bronstein, Yang You, Zhangyang Wang, and Kai Wang. 2025. https://doi.org/10.48550/ARXIV.2506.16406 Drag-and-drop llms: Zero-shot prompt-to-weights . CoRR, abs/2506.16406

  24. [24]

    Lei Liu, Xiaoyan Yang, Yue Shen, Binbin Hu, Zhiqiang Zhang, Jinjie Gu, and Guannan Zhang. 2023. https://doi.org/10.48550/ARXIV.2311.08719 Think-in-memory: Recalling and post-thinking enable llms with long-term memory . CoRR, abs/2311.08719

  25. [25]

    Musique: Multihop questions via single-hop question composition.Trans

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. https://doi.org/10.1162/TACL\_A\_00638 Lost in the middle: How language models use long contexts . Trans. Assoc. Comput. Linguistics, 12:157--173

  26. [26]

    Adyasha Maharana, Dong - Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. 2024. https://doi.org/10.18653/V1/2024.ACL-LONG.747 Evaluating very long-term conversational memory of LLM agents . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thaila...

  27. [27]

    Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. http://papers.nips.cc/paper\_files/paper/2022/hash/6f1d43d5a82a37e89b0665b33bf3a182-Abstract-Conference.html Locating and editing factual associations in GPT . In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2...

  28. [28]

    Ponti, Laurent Charlin, Nicolas Le Roux, Lucas Caccia, and Alessandro Sordoni

    Oleksiy Ostapenko, Zhan Su, Edoardo M. Ponti, Laurent Charlin, Nicolas Le Roux, Lucas Caccia, and Alessandro Sordoni. 2024. https://proceedings.mlr.press/v235/ostapenko24a.html Towards modular llms by building and reusing a library of loras . In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 , Procee...

  29. [29]

    MemGPT: Towards LLMs as Operating Systems

    Charles Packer, Vivian Fang, Shishir G. Patil, Kevin Lin, Sarah Wooders, and Joseph E. Gonzalez. 2023. https://doi.org/10.48550/ARXIV.2310.08560 Memgpt: Towards llms as operating systems . CoRR, abs/2310.08560

  30. [30]

    Sergey Pletenev, Maria Marina, Daniil Moskovskiy, Vasily Konovalov, Pavel Braslavski, Alexander Panchenko, and Mikhail Salnikov. 2025. https://doi.org/10.18653/V1/2025.FINDINGS-NAACL.243 How much knowledge can you pack into a lora adapter without harming llm? In Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico...

  31. [31]

    Reyna and C.J

    V.F. Reyna and C.J. Brainerd. 1995. https://doi.org/10.1016/1041-6080(95)90031-4 Fuzzy-trace theory: An interim synthesis . Learning and Individual Differences, 7(1):1--75. Special Issue: Fuzzy-Trace Theory

  32. [32]

    Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. 2025. https://doi.org/10.1145/3726302.3729957 Parametric retrieval augmented generation . In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2025, Padua, Italy, July 13-18,...

  33. [33]

    Yuqiao Tan, Shizhu He, Huanxuan Liao, Jun Zhao, and Kang Liu. 2025 a . https://doi.org/10.48550/ARXIV.2503.23895 Better wit than wealth: Dynamic parametric retrieval augmented generation for test-time knowledge enhancement . CoRR, abs/2503.23895

  34. [34]

    Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Rajan Iyer, Tianlong Chen, Huan Liu, Chen - Yu Lee, and Tomas Pfister

    Zhen Tan, Jun Yan, I - Hung Hsu, Rujun Han, Zifeng Wang, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Rajan Iyer, Tianlong Chen, Huan Liu, Chen - Yu Lee, and Tomas Pfister. 2025 b . https://aclanthology.org/2025.acl-long.413/ In prospect and retrospect: Reflective memory management for long-term personalized dialogue agents . In P...

  35. [35]

    Llama Team. 2024. https://doi.org/10.48550/ARXIV.2407.21783 The llama 3 herd of models . CoRR, abs/2407.21783

  36. [36]

    Qwen Team. 2025. https://doi.org/10.48550/ARXIV.2505.09388 Qwen3 technical report . CoRR, abs/2505.09388

  37. [37]

    Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai - Wei Chang, and Dong Yu. 2025. https://openreview.net/forum?id=pZiyCaVuti Longmemeval: Benchmarking chat assistants on long-term interactive memory . In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025 . OpenReview.net

  38. [38]

    Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, and Mike Lewis. 2024. https://openreview.net/forum?id=NG7sS51zVF Efficient streaming language models with attention sinks . In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024 . OpenReview.net

  39. [39]

    Wujiang Xu, Zujie Liang, Kai Mei, Hang Gao, Juntao Tan, and Yongfeng Zhang. 2025. https://doi.org/10.48550/ARXIV.2502.12110 A-MEM: agentic memory for LLM agents . CoRR, abs/2502.12110

  40. [40]

    Ziwen Xu, Chenyan Wu, Hengyu Sun, Haiwen Hong, Mengru Wang, Yunzhi Yao, Longtao Huang, Hui Xue, Shumin Deng, Zhixuan Chu, Huajun Chen, and Ningyu Zhang. 2026. https://doi.org/10.48550/ARXIV.2602.02343 Why steering works: Toward a unified view of language model parameter dynamics . CoRR, abs/2602.02343

  41. [41]

    Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, and Weinan E. 2024. https://doi.org/10.48550/ARXIV.2407.01178 Memory\( ^ 3 \): Language modeling with explicit memory . CoRR, abs/2407.01178

  42. [42]

    Yunzhi Yao, Peng Wang, Bozhong Tian, Siyuan Cheng, Zhoubo Li, Shumin Deng, Huajun Chen, and Ningyu Zhang. 2023. https://doi.org/10.18653/V1/2023.EMNLP-MAIN.632 Editing large language models: Problems, methods, and opportunities . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10...

  43. [43]

    Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, and Tuo Zhao. 2023. https://openreview.net/forum?id=lq62uWRJjiY Adaptive budget allocation for parameter-efficient fine-tuning . In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net

  44. [44]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2)

  45. [45]

    Wanjun Zhong, Lianghong Guo, Qiqi Gao, He Ye, and Yanlin Wang. 2024. https://doi.org/10.1609/AAAI.V38I17.29946 Memorybank: Enhancing large language models with long-term memory . In Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Sympo...

  46. [46]

    Tongyao Zhu, Qian Liu, Liang Pang, Zhengbao Jiang, Min - Yen Kan, and Min Lin. 2024. https://doi.org/10.18653/V1/2024.ACL-LONG.185 Beyond memorization: The challenge of random memory access in language models . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, A...

  47. [47]

    online" 'onlinestring :=

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

  48. [48]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...