Recognition: unknown
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction
Pith reviewed 2026-05-07 06:53 UTC · model grok-4.3
The pith
Reliable AI memory requires schemas that guide iterative extraction and validation at write time rather than text retrieval at read time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that reliable external AI memory must be schema-grounded. Schemas define what must be remembered, what may be ignored, and which values must never be inferred. The authors present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control. The result shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose. On the end-to-end memory benchmark the system reaches 97.10% F1 compared with 80.16%-87.24% for baselines; on the
What carries the argument
Iterative schema-aware write path that decomposes ingestion into object detection, field detection, field-value extraction with validation gates, local retries, and stateful prompt control.
If this is right
- Memory operations such as updates, deletions, aggregations, and negative queries become reliable because they operate on verified records rather than inferred text.
- Object-level accuracy reaches 90.42% and output accuracy 62.67% on structured extraction benchmarks above tested frontier baselines.
- End-to-end memory performance reaches 97.10% F1, exceeding third-party baselines that range from 80.16% to 87.24%.
- Application-level tasks reach 95.2% accuracy, outperforming specialised memory systems, Markdown harnesses, and frontier-model application harnesses.
- For workloads that require stable facts and stateful computation, architecture and schema design matter more than retrieval scale or model strength alone.
Where Pith is reading between the lines
- Agents that run for many turns could maintain consistent long-term state by catching inconsistencies at ingestion rather than accumulating retrieval errors.
- Investing effort in domain-specific schemas and write-time validation may deliver higher reliability than further scaling of retrieval indices or model size.
- Domains with regulatory constraints on what may be inferred could adopt the same validation gates to enforce explicit unknowns and prevent over-inference.
- The design invites direct comparison with traditional database systems adapted for LLM-driven updates, where the same separation of write validation and read queries already exists.
Load-bearing premise
The schemas supplied correctly capture all relevant facts and constraints for the target domains, and the validation gates do not systematically reject valid information or accept invalid information in ways that bias the downstream results.
What would settle it
Run the system on a domain whose schemas are deliberately incomplete for critical facts and measure whether benchmark F1 or application accuracy drops below the reported levels while error rates on state updates rise.
Figures
read the original abstract
Persistent AI memory is often reduced to a retrieval problem: store prior interactions as text, embed them, and ask the model to recover relevant context later. This design is useful for thematic recall, but it is mismatched to the kinds of memory that agents need in production: exact facts, current state, updates and deletions, aggregation, relations, negative queries, and explicit unknowns. These operations require memory to behave less like search and more like a system of record. This paper argues that reliable external AI memory must be schema-grounded. Schemas define what must be remembered, what may be ignored, and which values must never be inferred. We present an iterative, schema-aware write path that decomposes memory ingestion into object detection, field detection, and field-value extraction, with validation gates, local retries, and stateful prompt control. The result shifts interpretation from the read path to the write path: reads become constrained queries over verified records rather than repeated inference over retrieved prose. We evaluate this design on structured extraction and end-to-end memory benchmarks. On the extraction benchmark, the judge-in-the-loop configuration reaches 90.42% object-level accuracy and 62.67% output accuracy, above all tested frontier structured-output baselines. On our end-to-end memory benchmark, xmemory reaches 97.10% F1, compared with 80.16%-87.24% across the third-party baselines. On the application-level task, xmemory reaches 95.2% accuracy, outperforming specialised memory systems, code-generated Markdown harnesses, and customer-facing frontier-model application harnesses. The results show that, for memory workloads requiring stable facts and stateful computation, architecture matters more than retrieval scale or model strength alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper argues that reliable AI memory requires schema-grounded storage rather than unstructured text retrieval, as the latter fails on exact facts, state updates, aggregations, and negative queries. It introduces an iterative schema-aware write path that decomposes ingestion into object detection, field detection, value extraction, validation gates, local retries, and stateful prompt control. Reads are then reduced to constrained queries over verified records. Evaluation on a structured extraction benchmark shows 90.42% object-level accuracy and 62.67% output accuracy (judge-in-the-loop) above frontier baselines; an end-to-end memory benchmark yields 97.10% F1 versus 80.16-87.24% for third-party systems; and an application-level task reaches 95.2% accuracy, outperforming specialized memory systems and frontier-model harnesses. The central conclusion is that architecture matters more than retrieval scale or model strength for stateful memory workloads.
Significance. If the performance gains are attributable to the iterative write-path design rather than schema alignment alone, the work offers a practical engineering shift from RAG-style memory to structured systems of record. This could matter for agentic applications that require stable facts, updates/deletions, and precise stateful computation, moving memory design from retrieval optimization toward verifiable data models.
major comments (3)
- [§4.2] §4.2 (end-to-end memory benchmark): the 97.10% F1 result is reported without rejection rates, false-negative rates on valid extractions, or any control condition using deliberately incomplete or mismatched schemas. Because the benchmarks supply complete, task-matched schemas up front, it is impossible to determine whether the gains derive from the iterative extraction architecture or from schema provision plus gate filtering; this directly undermines the claim that the write-path design is the decisive factor.
- [§4.1] §4.1 (structured extraction benchmark): the 90.42% object-level and 62.67% output accuracies are presented without stating whether schema definitions and data splits were fixed before any results were inspected or whether post-hoc refinement occurred. In the absence of this information, the superiority over frontier structured-output baselines cannot be confidently attributed to the method rather than evaluation design choices.
- [§4] §4 (baseline comparisons): the paper does not detail how (or whether) equivalent schemas were supplied to the third-party memory systems, code-generated Markdown harnesses, and customer-facing frontier-model harnesses. If xmemory alone receives explicit schema guidance while baselines operate under weaker or absent schema constraints, the reported accuracy gaps (95.2% vs. lower scores) cannot be interpreted as evidence that architecture outperforms retrieval scale or model strength.
minor comments (3)
- [Title and abstract] The system is referred to as 'xmemory' in the abstract and results but is not named in the title; adding the name or a short system description to the title would improve discoverability.
- [Method section] The description of the iterative loop (object detection → field detection → validation gates → retry) would be clearer with a short pseudocode listing or state-transition diagram in the method section.
- [Evaluation tables/figures] Table or figure captions for the benchmark results should explicitly list the exact schemas and judge prompts used, or provide a pointer to the supplementary material containing them.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our evaluation methodology. We have revised the manuscript to improve transparency around schema provision, pre-specification of evaluation parameters, and additional metrics. Below we respond to each major comment.
read point-by-point responses
-
Referee: [§4.2] §4.2 (end-to-end memory benchmark): the 97.10% F1 result is reported without rejection rates, false-negative rates on valid extractions, or any control condition using deliberately incomplete or mismatched schemas. Because the benchmarks supply complete, task-matched schemas up front, it is impossible to determine whether the gains derive from the iterative extraction architecture or from schema provision plus gate filtering; this directly undermines the claim that the write-path design is the decisive factor.
Authors: We agree that rejection rates and false-negative rates should have been reported. The revised §4.2 now includes these values computed from our experimental logs. However, the benchmark was intentionally scoped to complete, task-matched schemas because that matches the target use case of schema-grounded memory as a verified system of record. We have added text clarifying this design choice and explaining why incomplete-schema controls fall outside the current evaluation scope. We maintain that the performance gap versus retrieval baselines (which receive equivalent schema information where possible) supports the contribution of the iterative write path. revision: partial
-
Referee: [§4.1] §4.1 (structured extraction benchmark): the 90.42% object-level and 62.67% output accuracies are presented without stating whether schema definitions and data splits were fixed before any results were inspected or whether post-hoc refinement occurred. In the absence of this information, the superiority over frontier structured-output baselines cannot be confidently attributed to the method rather than evaluation design choices.
Authors: The schema definitions were derived directly from the benchmark specification and the data splits were determined via a fixed, deterministic procedure before any model runs or result inspection occurred. No post-hoc refinement of schemas or splits took place. We have added an explicit statement in the revised §4.1 confirming this pre-specification protocol. revision: yes
-
Referee: [§4] §4 (baseline comparisons): the paper does not detail how (or whether) equivalent schemas were supplied to the third-party memory systems, code-generated Markdown harnesses, and customer-facing frontier-model harnesses. If xmemory alone receives explicit schema guidance while baselines operate under weaker or absent schema constraints, the reported accuracy gaps (95.2% vs. lower scores) cannot be interpreted as evidence that architecture outperforms retrieval scale or model strength.
Authors: Equivalent schema information was supplied to every baseline that supports structured input. Third-party memory systems and frontier harnesses received the schemas through their native structured interfaces; Markdown harnesses received schema content translated into detailed prompt instructions. The revised §4 now contains a dedicated paragraph and table that documents the exact schema provision method used for each baseline, making the comparison transparent. revision: yes
- Control experiment with deliberately incomplete or mismatched schemas on the end-to-end memory benchmark (no such results were generated as the evaluation was scoped to complete schemas)
Circularity Check
No circularity: empirical engineering evaluation on external benchmarks
full rationale
The paper describes an iterative schema-aware extraction architecture for AI memory and reports empirical results on structured extraction and end-to-end memory benchmarks (90.42% object-level accuracy, 97.10% F1, 95.2% accuracy). No equations, fitted parameters, predictions derived from those parameters, or self-citations appear in the abstract or evaluation description. The central claims rest on direct comparison to third-party baselines rather than any reduction of outputs to inputs by definition or construction. The method is presented as an engineering design whose performance is measured externally, with no load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can be prompted to perform object detection, field detection, and value extraction when given an explicit schema.
- domain assumption Validation gates and local retries can correct extraction errors without introducing new systematic bias.
Reference graph
Works this paper leans on
-
[1]
Memory in the Age of AI Agents
Yuyang Hu, Shichun Liu, Yanwei Yue, Guibin Zhang, Boyang Liu, Fangyi Zhu, Jiahang Lin, Honglin Guo, Shihan Dou, Zhiheng Xi, Senjie Jin, Jiejun Tan, Yanbin Yin, Jiongnan Liu, Zeyu Zhang, Zhongxiang Sun, Yutao Zhu, Hao Sun, Boci Peng, Zhenrong Cheng, Xuanbo Fan, Jiaxin Guo, Xinlei Yu, Zhenhong Zhou, Zewen Hu, Jiahao Huo, Junhao Wang, Yuwei Niu, Yu Wang, Zhe...
work page internal anchor Pith review arXiv 2025
-
[2]
Memory operations in large language models: A survey
Yifan Du, Chongyang Huang, Wayne Xin Zhao, Ji-Rong Wen, et al. Rethinking memory in AI: Taxonomy, operations, topics, and future directions.arXiv preprint arXiv:2505.00675,
-
[3]
Memory operations in large language models: A survey
doi: 10.48550/arXiv.2505.00675. URLhttps://arxiv.org/abs/2505.00675
-
[4]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich K¨ uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨ aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. URL https: //arxiv.org/abs/2...
work page internal anchor Pith review arXiv 2020
-
[5]
arXiv preprint arXiv:2004.04906 , year=
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020. URLhttps://arxiv.org/abs/2004.04906
-
[6]
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Prateek Chhikara, Dev Khant, Saket Aryan, Taranjeet Singh, and Deshraj Yadav. Mem0: Building production-ready ai agents with scalable long-term memory.arXiv preprint arXiv:2504.19413, 2025. doi: 10.48550/arXiv.2504.19413. URL https://arxiv.org/abs/ 2504.19413
work page internal anchor Pith review doi:10.48550/arxiv.2504.19413 2025
-
[7]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019. URLhttps://arxiv.org/abs/1908.10084
work page internal anchor Pith review arXiv 2019
-
[8]
Available: https://doi.org/10.1162/tacl a 00449
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 2024. URL https://arxiv. org/abs/2307.03172. Published in TACL 2024
work page internal anchor Pith review arXiv 2024
-
[9]
Darshan Deshpande, Varun Gangal, Hersh Mehta, Anand Kannappan, Rebecca Qian, and Peng Wang. MEMTRACK: Evaluating long-term memory and state tracking in 27 multi-platform dynamic agent environments.arXiv preprint arXiv:2510.01353, 2025. doi: 10.48550/arXiv.2510.01353. URLhttps://arxiv.org/abs/2510.01353
-
[10]
Evaluating Very Long-Term Conversational Memory of LLM Agents
Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, and Yuwei Fang. LoCoMo: Evaluating very long-term conversational memory of LLM agents.arXiv preprint arXiv:2402.17753, 2024. doi: 10.48550/arXiv.2402.17753. URL https://arxiv.org/abs/2402.17753
work page internal anchor Pith review doi:10.48550/arxiv.2402.17753 2024
-
[11]
Cover and Joy A
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley, 2 edition,
-
[12]
URLhttps://onlinelibrary.wiley.com/doi/book/10.1002/047174882X
-
[13]
Claude E. Shannon. A mathematical theory of communication.The Bell System Techni- cal Journal, 27(3):379–423, 1948. URL https://people.math.harvard.edu/~ctm/home/ text/others/shannon/entropy/entropy.pdf
1948
-
[14]
Shizhe He, Avanika Narayan, Ishan S. Khare, Scott W. Linderman, Christopher R´ e, and Dan Biderman. An information theoretic perspective on agentic system design.arXiv preprint arXiv:2512.21720, December 2025. doi: 10.48550/arXiv.2512.21720. URL https: //arxiv.org/abs/2512.21720
-
[15]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020. URLhttps://a...
work page internal anchor Pith review arXiv 2020
- [16]
-
[17]
Why Language Models Hallucinate
Adam Tauman Kalai, Ofir Nachum, Santosh S. Vempala, and Edwin Zhang. Why language models hallucinate.arXiv preprint arXiv:2509.04664, 2025. doi: 10.48550/arXiv.2509.04664. URLhttps://arxiv.org/abs/2509.04664
work page internal anchor Pith review doi:10.48550/arxiv.2509.04664 2025
-
[18]
Rodrigo Nogueira and Kyunghyun Cho. Passage re-ranking with BERT.arXiv preprint arXiv:1901.04085, 2019. URLhttps://arxiv.org/abs/1901.04085
work page internal anchor Pith review arXiv 1901
-
[19]
Muhammad Ahmed Mohsin, Muhammad Umer, Ahsan Bilal, Zeeshan Memon, Muham- mad Ibtsaam Qadir, Sagnik Bhattacharya, Hassan Rizwan, Abhiram R. Gorle, Maahe Zehra Kazmi, Ayesha Mohsin, Muhammad Usman Rafique, Zihao He, Pulkit Mehta, Muham- mad Ali Jamshed, and John M. Cioffi. On the fundamental limits of LLMs at scale.arXiv preprint arXiv:2511.12869, 2025. URL...
-
[20]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. From local to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024. URL https: //arxiv.org/abs/2404.16130
work page internal anchor Pith review arXiv 2024
-
[21]
Spider 2.0: Evaluating language models on real- world enterprise text-to-SQL workflows, 2024
Fangyu Lei, Jixuan Chen, Yuxiao Ye, Ruisheng Cao, Dongchan Shin, Hongjin Su, Zhaoqing Suo, Hongcheng Gao, Wenjing Hu, Pengcheng Yin, Victor Zhong, Caiming Xiong, Ruoxi 28 Sun, Qian Liu, Sida Wang, and Tao Yu. Spider 2.0: Evaluating language models on real- world enterprise text-to-SQL workflows, 2024. URL https://arxiv.org/abs/2411.07763. ICLR 2025 Oral
-
[22]
Derong Xu, Wei Chen, Wenjun Peng, Chao Zhang, Tong Xu, Xiangyu Zhao, Xian Wu, Yefeng Zheng, Yang Wang, and Enhong Chen. Large language models for generative information extraction: A survey.arXiv preprint arXiv:2312.17617, 2024. URL https: //arxiv.org/abs/2312.17617
-
[23]
Self-Refine: Iterative Refinement with Self-Feedback
Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegr- effe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bod- hisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, and Peter Clark. Self-refine: Iterative refinement with self-feedback.arXiv preprint arXiv:2303.17651, 2023. URLht...
work page internal anchor Pith review arXiv 2023
-
[24]
Position: Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Elliot Meyerson, Giuseppe Paolo, Roberto Dailey, Hormoz Shahrzad, Olivier Francon, Conor F. Hayes, Xin Qiu, Babak Hodjat, and Risto Miikkulainen. Solving a million-step LLM task with zero errors.arXiv preprint arXiv:2511.09030, 2025. doi: 10.48550/arXiv. 2511.09030. URLhttps://arxiv.org/abs/2511.09030
work page internal anchor Pith review doi:10.48550/arxiv 2025
-
[25]
Rosen, Gerbrand Ceder, Kristin A
John Dagdelen, Alexander Dunn, Sanghoon Lee, Nicholas Walker, Andrew S. Rosen, Gerbrand Ceder, Kristin A. Persson, and Anubhav Jain. Structured information extraction from scientific text with large language models.Nature Communications, 15(1):1418, 2024. URLhttps://www.nature.com/articles/s41467-024-45563-x
2024
-
[26]
In: Al-Onaizan, Y., Bansal, M., Chen, Y.N
Haolun Wu, Ye Yuan, Liana Mikaelyan, Alexander Meulemans, Xue Liu, James Hensman, and Bhaskar Mitra. Learning to extract structured entities using language models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 6817–6834. Association for Computational Linguistics, 2024. doi: 10.18653/v1/2024. emnlp-main.38...
-
[27]
OneKE: A dockerized schema-guided LLM agent-based knowledge extraction system
Yujie Luo, Xiangyuan Ru, Kangwei Liu, Lin Yuan, Mengshu Sun, Ningyu Zhang, Lei Liang, Zhiqiang Zhang, Jun Zhou, Lanning Wei, Da Zheng, Haofen Wang, and Huajun Chen. OneKE: A dockerized schema-guided LLM agent-based knowledge extraction system. In Companion Proceedings of the ACM Web Conference 2025 (WWW Companion ’25), 2025. doi: 10.1145/3701716.3715189. ...
-
[28]
Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni. Grammar-aligned decoding. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. URLhttps://arxiv.org/abs/2405.21047
-
[29]
Why and where: A characterization of data provenance
Peter Buneman, Sanjeev Khanna, and Wang-Chiew Tan. Why and where: A characterization of data provenance. InInternational Conference on Database Theory (ICDT), 2001. URL https://homepages.inf.ed.ac.uk/opb/papers/ICDT2001.pdf
2001
-
[30]
Context rot: How increasing input tokens impacts LLM performance
Brandon Hong et al. Context rot: How increasing input tokens impacts LLM performance. Technical report, Chroma, 2025. URLhttps://research.trychroma.com/context-rot
2025
-
[31]
YAML ain’t markup language (YAML) version 1.2.2
YAML Language Development Team. YAML ain’t markup language (YAML) version 1.2.2. https://yaml.org/spec/1.2.2/, 2021
2021
-
[32]
JSON schema: A media type for describing JSON documents (draft 2020-12).https://json-schema.org/draft/2020-12/json-schema-core, 2020
JSON Schema Authors. JSON schema: A media type for describing JSON documents (draft 2020-12).https://json-schema.org/draft/2020-12/json-schema-core, 2020. 29
2020
-
[33]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models.arXiv preprint arXiv:2210.03629, 2022. URLhttps://arxiv.org/abs/2210.03629
work page internal anchor Pith review arXiv 2022
-
[34]
NoSQL schema evolution and data migra- tion
Uta St¨ orl, Meike Klettke, and Stefanie Scherzinger. NoSQL schema evolution and data migra- tion. InProceedings of the 23rd International Conference on Extending Database Technology (EDBT), 2020. URLhttps://openproceedings.org/2020/conf/edbt/paper_T4.pdf
2020
-
[35]
A generic schema evolution approach for NoSQL and re- lational databases.IEEE Transactions on Knowledge and Data Engineer- ing, 2024
Alberto Hern´ andez Chill´ on, Meike Klettke, Diego Sevilla Ruiz, and Jes´ us Garc´ ıa Molina. A generic schema evolution approach for NoSQL and re- lational databases.IEEE Transactions on Knowledge and Data Engineer- ing, 2024. URL https://epub.uni-regensburg.de/77266/1/A_Generic_Schema_ Evolution_Approach_for_NoSQL_and_Relational_Databases.pdf
2024
-
[36]
LLM structured output benchmarks are riddled with mis- takes, 2025
Jonas Mueller Hui Wen Goh. LLM structured output benchmarks are riddled with mis- takes, 2025. URL https://cleanlab.ai/blog/structured-output-benchmark/. Ac- cessed: 2026-04-16
2025
-
[37]
Cognee github repository and readme
Topoteretes. Cognee github repository and readme. https://github.com/topoteretes/ cognee, 2026. Accessed 2026-04-22
2026
-
[38]
Mem0 documentation: Build with mem0
Mem0. Mem0 documentation: Build with mem0. https://docs.mem0.ai/introduction,
-
[39]
Supermemory documentation: Overview — what is supermemory? https: //supermemory.ai/docs/intro, 2026
Supermemory. Supermemory documentation: Overview — what is supermemory? https: //supermemory.ai/docs/intro, 2026. Accessed 2026-04-22
2026
-
[40]
Zep documentation and platform overview
Zep. Zep documentation and platform overview. https://www.getzep.com, 2026. Accessed 2026-04-24
2026
-
[41]
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory
Di Wu, Hongwei Wang, Wenhao Yu, Yuwei Zhang, Kai-Wei Chang, and Dong Yu. Long- MemEval: Benchmarking chat assistants on long-term interactive memory.arXiv preprint arXiv:2410.10813, 2024. doi: 10.48550/arXiv.2410.10813. URL https://arxiv.org/abs/ 2410.10813
work page internal anchor Pith review doi:10.48550/arxiv.2410.10813 2024
-
[42]
Benchmarking ai agent memory
Letta. Benchmarking ai agent memory. https://www.letta.com/blog/ benchmarking-ai-agent-memory, 2026. Accessed 2026-04-24
2026
-
[43]
entropy jump
snap-research and community contributors. Locomo issue discussion: Dataset label quality estimate. https://github.com/snap-research/locomo/issues/27# issuecomment-3921992262, 2025. Accessed 2026-04-24. 30 A Appendix A.1 Information-theoretic intuition Extracting structured facts from language is a transition from a high-entropy representation to a low-ent...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.