pith. machine review for the scientific record. sign in

arxiv: 2605.13481 · v1 · submitted 2026-05-13 · 💻 cs.CL

Recognition: unknown

PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:11 UTC · model grok-4.3

classification 💻 cs.CL
keywords knowledge graphsLLM agentsgraph retrievalplanning mechanismretrieval augmented generationfactual correctnesshallucination reductionpersonalized AI
0
0 comments X

The pith

PAI-2 improves LLM factual accuracy through adaptive graph traversal and planning on knowledge graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PersonalAI 2.0, a framework that augments large language models with external knowledge graphs using a multistage pipeline for query processing. This pipeline enables adaptive and iterative searches that extract entities, match graph vertices, and generate clue queries to direct retrieval. Tests across six benchmarks show gains in answer correctness, with graph traversal methods outperforming flat retrievers by 6 percent on average and the planning component adding an 18 percent lift when enabled. The work targets reductions in hallucination for personalized LLM agents that need structured, context-aware reasoning.

Core claim

PAI-2 performs adaptive, iterative information search guided by extracted entities, matched graph vertices, and generated clue-queries within a dynamic multistage pipeline. On Natural Questions, TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue, and DiaASQ it outperforms LightRAG, RAPTOR, and HippoRAG 2, delivering a 4 percent average gain by LLM-as-a-Judge. Graph traversal algorithms such as BeamSearch and WaterCircles improve results by 6 percent over standard flatten retrievers, while the search plan enhancement mechanism supplies an 18 percent boost compared with the disabled version across the six datasets. PAI-2 also reaches state-of-the-art 89 percent information-retention on the MINE-1 7-

What carries the argument

The adaptive multistage query processing pipeline that guides iterative graph search through extracted entities, matched vertices, and generated clue-queries.

Load-bearing premise

The reported gains from planning and traversal will generalize beyond the six tested datasets and the specific LLMs used, and LLM-as-a-Judge will measure factual correctness without its own biases.

What would settle it

Evaluating PAI-2 on an additional benchmark or with a different LLM family and finding no gain or a decline in factual correctness scores would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13481 by Alexander Kharitonov, Alina Bogdanova, Artyom Sosedka, Ekaterina Lisitsyna, Evgeny Burnaev, Ilia Perepechkin, Matvey Iskornev, Mikhail Belkin, Mikhail Menschikov, Ruslan Kostoev, Victoria Dochkina.

Figure 1
Figure 1. Figure 1: FIGURE 1 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p032_2.png] view at source ↗
Figure 37
Figure 37. Figure 37: VOLUME 15, 2026 33 [PITH_FULL_IMAGE:figures/full_fig_p033_37.png] view at source ↗
read the original abstract

We introduce PersonalAI 2.0 (PAI-2), a novel framework, designed to enhance large language model (LLM) based systems through integration of external knowledge graphs (KG). The proposed approach addresses key limitations of existing Graph Retrieval-Augmented Generation (GraphRAG) methods by incorporating a dynamic, multistage query processing pipeline. The central point of PAI-2 design is its ability to perform adaptive, iterative information search, guided by extracted entities, matched graph vertices and generated clue-queries. Conducted evaluation over six benchmarks (Natural Questions, TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue and DiaASQ) demonstrates improvement in factual correctness of generating answers compared to analogues methods (LightRAG, RAPTOR, and HippoRAG 2). PAI-2 achieves 4% average gain by LLM-as-a-Judge across four benchmarks, reflecting its effectiveness in reducing hallucination rates and increasing precision. We show that use of graph traversal algorithms (e.g. BeamSearch, WaterCircles) gain superior results compared to standard flatten retriever on average 6%, while enabled search plan enhancement mechanism gain 18% boost compared to disabled one by LLM-as-a-Judge across six datasets. In addition, ablation study reveals that PAI-2 achieves the SOTA result on MINE-1 benchmark, achieving 89% information-retention score, using LLMs from 7-14B tiers. Collectively, these findings underscore the potential of PAI-2 to serve as a foundational model for next-generation personalized AI applications, requiring scalable, context-aware knowledge representation and reasoning capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces PersonalAI 2.0 (PAI-2), a framework integrating external knowledge graphs into LLM systems via a dynamic multistage query pipeline that performs adaptive iterative search using extracted entities, matched vertices, and generated clue-queries. It claims empirical improvements over LightRAG, RAPTOR, and HippoRAG 2 on six benchmarks (Natural Questions, TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue, DiaASQ), including a 4% average gain by LLM-as-a-Judge across four benchmarks, 6% average superiority from graph traversal algorithms (e.g., BeamSearch, WaterCircles) versus flatten retrievers, an 18% boost from the enabled search-plan enhancement mechanism across six datasets, and SOTA 89% information-retention on the MINE-1 benchmark using 7-14B LLMs.

Significance. If the quantitative claims hold under rigorous validation, PAI-2 would offer a concrete advance in GraphRAG by demonstrating the value of planning and traversal mechanisms for reducing hallucinations and improving precision in personalized agents. The ablation results isolating the 18% contribution of the search-plan component and the SOTA result on MINE-1 with modest-sized models constitute reproducible evidence of component-level gains that could inform next-generation context-aware KG systems.

major comments (2)
  1. [Abstract] Abstract and Evaluation section: The headline performance claims (4% average LLM-as-a-Judge gain on four benchmarks, 6% traversal improvement, 18% search-plan boost) are presented without any description of the experimental protocol, including data splits, judge-model choice, prompt template for the LLM-as-a-Judge, statistical significance tests, error bars, or controls for confounds such as output length or stylistic bias. This absence directly undermines the central assertion that PAI-2 reduces hallucination rates and increases factual precision.
  2. [Evaluation] Evaluation section: No validation of the LLM-as-a-Judge metric against human judgments, inter-annotator agreement scores, or bias analysis is provided, despite the metric being the sole basis for all reported gains and the claim of superior factual correctness over baselines.
minor comments (1)
  1. [Abstract] Abstract: 'analogues methods' should read 'analogous methods'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important gaps in methodological transparency. We will revise the manuscript to incorporate detailed experimental protocols and a validation study for the LLM-as-a-Judge metric, thereby strengthening the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract and Evaluation section: The headline performance claims (4% average LLM-as-a-Judge gain on four benchmarks, 6% traversal improvement, 18% search-plan boost) are presented without any description of the experimental protocol, including data splits, judge-model choice, prompt template for the LLM-as-a-Judge, statistical significance tests, error bars, or controls for confounds such as output length or stylistic bias. This absence directly undermines the central assertion that PAI-2 reduces hallucination rates and increases factual precision.

    Authors: We acknowledge that the abstract and Evaluation section lack explicit descriptions of the experimental protocol. In the revised manuscript, we will expand the Evaluation section to detail the data splits used, the specific judge model and its version, the full prompt template for LLM-as-a-Judge, results from statistical significance tests, error bars on all reported metrics, and controls for confounds including output length and stylistic bias. The abstract will be updated to reference these additions. revision: yes

  2. Referee: [Evaluation] Evaluation section: No validation of the LLM-as-a-Judge metric against human judgments, inter-annotator agreement scores, or bias analysis is provided, despite the metric being the sole basis for all reported gains and the claim of superior factual correctness over baselines.

    Authors: We agree that direct validation of the LLM-as-a-Judge metric is necessary. We will add a new subsection in the revised Evaluation section reporting a human validation study, including agreement rates between LLM-as-a-Judge scores and human annotations, inter-annotator agreement metrics, and an analysis of potential biases. This will be based on a sampled subset of the benchmark outputs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark gains reported directly from external evaluations

full rationale

The paper introduces PAI-2 as an engineering framework for KG-enhanced LLM agents and supports its claims exclusively through direct empirical comparisons on six named external benchmarks (Natural Questions, TriviaQA, HotpotQA, etc.). Reported improvements (4% average by LLM-as-a-Judge, 6% from traversal algorithms, 18% from search-plan enhancement) are presented as measured outcomes against baselines such as LightRAG and HippoRAG 2, with an ablation study on MINE-1. No equations, fitted parameters, self-definitional quantities, or predictions derived from internal inputs appear in the provided text; the derivation chain consists of system description followed by independent benchmark scoring rather than any reduction of results to the method's own definitions or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the system appears to compose existing knowledge-graph traversal algorithms and LLMs without introducing new ungrounded constructs.

pith-pipeline@v0.9.0 · 5656 in / 1129 out tokens · 29165 ms · 2026-05-14T19:11:58.043032+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    Qwen3 technical report, 2025

    An Y ang, Anfeng Li, Baosong Y ang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Y u, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Y ang, Jianhong Tu, Jianwei Zhang, Jianxin Y ang, Jiaxi Y ang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin ...

  2. [2]

    Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J

    DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Y ang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Hua...

  3. [3]

    Glm-4.5: Agentic, reasoning, and coding (arc) foundation models, 2025

    5 Team, Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin, Hao Zeng, Jiajie Zhang, Kedong Wang, Lucen Zhong, Mingdao Liu, Rui Lu, Shulin Cao, Xiaohan Zhang, Xuancheng Huang, Y ao Wei, Y ean Cheng, Yifan An, Yilin Niu, Y uanhao Wen, Y ushi Bai, Zhengxiao Du, Zihan Wang, Zilin Zhu, Bohan Zhang, Bosi Wen, Bowen Wu, ...

  4. [4]

    Wikontic: Constructing Wikidata-aligned, ontology-aware knowledge graphs with large language models

    Alla Chepurova, Aydar Bulatov, Mikhail Burtsev, and Y uri Kuratov. Wikontic: Constructing Wikidata-aligned, ontology-aware knowledge graphs with large language models. In V era Demberg, Kentaro Inui, and Lluís Marquez, editors,Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (V ol- ume 1: Long Pap...

  5. [5]

    Autoschemakg: Autonomous knowledge graph construction through dynamic schema induction from web-scale corpora, 2025

    Jiaxin Bai, Wei Fan, Qi Hu, Qing Zong, Chunyang Li, Hong Ting Tsang, Hongyu Luo, Y auwai Yim, Haoyu Huang, Xiao Zhou, Feng Qin, Tianshi Zheng, Xi Peng, Xin Y ao, Huiwen Y ang, Leijie Wu, Yi Ji, Gong Zhang, Renhai Chen, and Y angqiu Song. Autoschemakg: Autonomous knowledge graph construction through dynamic schema induction from web-scale corpora, 2025

  6. [6]

    T-grag: A dynamic graphrag framework for resolving temporal conflicts and redundancy in knowledge retrieval

    Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, and Jianxing Liu. T-grag: A dynamic graphrag framework for resolving temporal conflicts and redundancy in knowledge retrieval. InProceedings of the 33rd ACM International Conference on Multimedia, MM ’25, page 11880–11889, New Y ork, NY , USA, 2025. Association for Computing Machinery

  7. [7]

    Retrieval-Augmented Generation for Large Language Models: A Survey

    Y unfan Gao, Y un Xiong, Xinyu Gao, Kangxiang Jia, Jin Pan, Y uxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval- augmented generation for large language models: A survey.ArXiv, abs/2312.10997, 2023

  8. [8]

    Graph retrieval-augmented genera- tion: A survey.ACM Trans

    Boci Peng, Y un Zhu, Y ongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Y an Zhang, and Siliang Tang. Graph retrieval-augmented genera- tion: A survey.ACM Trans. Inf. Syst., 44(2), December 2025

  9. [9]

    GRAG: Graph retrieval-augmented generation

    Y untong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. GRAG: Graph retrieval-augmented generation. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors,Findings of the Association for Compu- tational Linguistics: NAACL 2025, pages 4145–4157, Albuquerque, New Mexico, April 2025. Association for Computational Linguistics

  10. [10]

    GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs

    Costas Mavromatis and George Karypis. GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Moham- mad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics: ACL 2025, pages 16682–16699, Vienna, Austria, July 2025. Association for...

  11. [11]

    Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models

    Linhao Luo, Zicheng Zhao, Chen Gong, Gholamreza Haffari, and Shirui Pan. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models. InF orty-second International Con- ference on Machine Learning, 2025

  12. [12]

    Per- sonalAI: A Systematic Comparison of Knowledge Graph Storage and Re- trieval Approaches for Personalized LLM Agents.IEEE Access, 14:58262– 58281, 2026

    Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Nikita Semenov, and Evgeny Burnaev. Per- sonalAI: A Systematic Comparison of Knowledge Graph Storage and Re- trieval Approaches for Personalized LLM Agents.IEEE Access, 14:58262– 58281, 2026

  13. [13]

    Ni, Heung-Y eung Shum, and Jian Guo

    Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Y eyun Gong, Lionel M. Ni, Heung-Y eung Shum, and Jian Guo. Think- on-graph: Deep and responsible reasoning of large language model on knowledge graph. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenRe- view.net, 2024

  14. [14]

    Reason- ing on graphs: Faithful and interpretable large language model reasoning

    Linhao Luo, Y uan-Fang Li, Gholamreza Haffari, and Shirui Pan. Reason- ing on graphs: Faithful and interpretable large language model reasoning. InInternational Conference on Learning Representations, 2024

  15. [15]

    Debate on graph: a flexible and reliable reasoning framework for large language models

    Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, et al. Debate on graph: a flexible and reliable reasoning framework for large language models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 24768–24776, 2025

  16. [16]

    An enhanced prompt- based llm reasoning scheme via knowledge graph-integrated collaboration

    Yihao Li, Ru Zhang, Jianyi Liu, and Gongshen Liu. An enhanced prompt- based llm reasoning scheme via knowledge graph-integrated collaboration. ArXiv, abs/2402.04978, 2024

  17. [17]

    Enhancing large language models with pseudo- and multisource- knowledge graphs for open-ended question answering.ArXiv, abs/2402.09911, 2025

    Jiaxiang Liu, Tong Zhou, Y ubo Chen, Kang Liu, and Jun Zhao. Enhancing large language models with pseudo- and multisource- knowledge graphs for open-ended question answering.ArXiv, abs/2402.09911, 2025

  18. [18]

    Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

    Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob VOLUME 15, 2026 35 Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answerin...

  19. [19]

    Trivi- aQA: A large scale distantly supervised challenge dataset for reading com- prehension

    Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. Trivi- aQA: A large scale distantly supervised challenge dataset for reading com- prehension. In Regina Barzilay and Min-Y en Kan, editors,Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 1601–1611, V ancouver, Canada, July

  20. [20]

    Association for Computational Linguistics

  21. [21]

    Zhilin Y ang, Peng Qi, Saizheng Zhang, Y oshua Bengio, William Co- hen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proc...

  22. [22]

    Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Donia Scott, Nuria Bel, and Chengqing Zong, edi- tors,Proceedings of the 28th International Conference on Computational Linguistics, pages 6609–6625, Barcelona, Spain (Online), December 2020. Internationa...

  23. [23]

    Musique: Multihop questions via single-hop question composition

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sab- harwal. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539– 554, 05 2022

  24. [24]

    Di- aASQ: A benchmark of conversational aspect-based sentiment quadruple analysis

    Bobo Li, Hao Fei, Fei Li, Y uhan Wu, Jinsong Zhang, Shengqiong Wu, Jingye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, and Donghong Ji. Di- aASQ: A benchmark of conversational aspect-based sentiment quadruple analysis. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 1...

  25. [25]

    Bleu: a method for automatic evaluation of machine translation

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

  26. [26]

    Rouge: A package for automatic evaluation of summaries

    Chin-Y ew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

  27. [27]

    Meteor universal: Language specific translation evaluation for any target language

    Michael Denkowski and Alon Lavie. Meteor universal: Language specific translation evaluation for any target language. InProceedings of the ninth workshop on statistical machine translation, pages 376–380, 2014

  28. [28]

    BERTScore: Evaluating Text Generation with BERT

    Tianyi Zhang, V arsha Kishore, Felix Wu, Kilian Q. Weinberger, and Y oav Artzi. Bertscore: Evaluating text generation with bert.ArXiv, abs/1904.09675, 2019

  29. [29]

    Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in Neural Information Processing Systems, 36:46595–46623, 2023

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Y onghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in Neural Information Processing Systems, 36:46595–46623, 2023

  30. [30]

    Computing krippendorff’s alpha-reliability

    Klaus Krippendorff. Computing krippendorff’s alpha-reliability. 2011

  31. [31]

    Ligh- tRAG: Simple and fast retrieval-augmented generation

    Zirui Guo, Lianghao Xia, Y anhua Y u, Tu Ao, and Chao Huang. Ligh- tRAG: Simple and fast retrieval-augmented generation. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10746–10761, Suzhou, China, November 2025. As- sociation for Computa...

  32. [32]

    Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. InInternational Conference on Learning Representations (ICLR), 2024

  33. [33]

    From rag to memory: Non-parametric continual learning for large language models, 2025

    Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Y u Su. From rag to memory: Non-parametric continual learning for large language models, 2025

  34. [34]

    Sorokin, Dmitry Evseev, Mikhail S

    Petr Anokhin, Nikita Semenov, Artyom Y . Sorokin, Dmitry Evseev, Mikhail S. Burtsev, and Evgeny Burnaev. Arigraph: Learning knowledge graph world models with episodic memory for llm agents. InInternational Joint Conference on Artificial Intelligence, 2024

  35. [35]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

  36. [36]

    Zep: A Temporal Knowledge Graph Architecture for Agent Memory

    Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory.ArXiv, abs/2501.13956, 2025

  37. [37]

    Extract, define, canonicalize: An LLM- based framework for knowledge graph construction

    Bowen Zhang and Harold Soh. Extract, define, canonicalize: An LLM- based framework for knowledge graph construction. In Y aser Al-Onaizan, Mohit Bansal, and Y un-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9820–9836, Miami, Florida, USA, November 2024. Association for Com- putational Linguistics

  38. [38]

    Kggen: Extracting knowledge graphs from plain text with language models, 2025

    Belinda Mo, Kyssen Y u, Joshua Kazdan, Joan Cabezas, Proud Mpala, Lisa Y u, Chris Cundy, Charilaos Kanatsoulis, and Sanmi Koyejo. Kggen: Extracting knowledge graphs from plain text with language models, 2025

  39. [39]

    From local to global: A graph rag approach to query-focused summarization, 2025

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization, 2025. M. MENSCHIKOVreceived the B.Sc. degree in Software Engineering from Petrozavodsk State University in 2023 and the M.Sc....

  40. [40]

    His research interests include generative modeling, mani- fold learning, deep learning for 3D data analysis, multi-agent systems, and industrial applications

    He is currently the Director of the AI Center, Skolkovo Institute of Science and Technology, and a Full Professor. His research interests include generative modeling, mani- fold learning, deep learning for 3D data analysis, multi-agent systems, and industrial applications. VOLUME 15, 2026 37