arxiv: 2604.20844 · v1 · submitted 2026-02-10 · 💻 cs.IR · cs.AI

Recognition: 1 theorem link

· Lean Theorem

AtomicRAG: Atom-Entity Graphs for Retrieval-Augmented Generation

Yanning Hou , Duanyang Yuan , Sihang Zhou , Xiaoshu Chen , Ke Liang , Siwei Wang , Xinwang Liu , Jian Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 03:34 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords Retrieval-Augmented GenerationGraphRAGAtomic FactsKnowledge GraphsPersonalized PageRankInformation RetrievalEntity Linking

0 comments

The pith

Knowledge broken into atomic facts and linked by simple existence edges in a graph improves retrieval accuracy and robustness over chunk-based RAG methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current GraphRAG approaches suffer because they index coarse text chunks that bundle multiple facts together and rely on error-prone relation triples for connections. Instead, it stores knowledge as individual atomic facts, each a self-contained unit, and connects entities only by the existence of a link rather than specific relations. Personalized PageRank combined with relevance filtering then extracts reliable paths for a given query. If correct, this representation lets the system flexibly reassemble facts to match different query needs without the rigidity of chunks or the fragility of extracted triples. The claim is backed by theoretical analysis and experiments across five public benchmarks showing gains in both retrieval accuracy and downstream reasoning.

Core claim

The Atom-Entity Graph stores knowledge as discrete atomic facts rather than text chunks and uses edges that only record whether a relationship exists between entities. Personalized PageRank run on this graph, followed by relevance-based filtering, produces more accurate and complete retrieval sets for generation. Because atoms can be combined without interference from unrelated facts inside the same chunk, the method supports diverse query perspectives while avoiding propagation of relation-extraction mistakes that break reasoning paths in triple-based graphs.

What carries the argument

The Atom-Entity Graph, in which each node holds one self-contained factual atom and each edge simply marks the existence of a connection, processed by personalized PageRank plus relevance filtering to select query-aligned paths.

If this is right

Retrieval can adapt to queries that require only a subset of facts from what was originally one chunk without losing precision.
Reasoning paths remain intact even when relation extraction would have produced a wrong triple.
Knowledge elements can be added or removed at atom granularity without rewriting entire chunks or rebuilding large parts of the graph.
The same index supports multiple downstream tasks that each need different combinations of facts from the source material.
Overall accuracy and robustness improve on standard RAG benchmarks when the atom-entity structure replaces chunk-triple graphs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may make it easier to update the knowledge base incrementally, since only affected atoms need re-indexing.
It could reduce dependence on high-quality relation extraction models, shifting effort toward accurate atomic fact segmentation.
Similar atom-level decomposition might benefit other graph retrieval settings outside RAG, such as multi-hop question answering over documents.
If atom extraction quality is high, the approach could extend naturally to multimodal sources where each modality contributes separate atomic units.

Load-bearing premise

Decomposing text into individual atomic facts and connecting entities only by existence edges will preserve all necessary context and avoid introducing new extraction errors that offset the gains in flexibility.

What would settle it

Replace the atom decomposition step with either full original chunks or randomly split sentences while keeping the same graph construction and PageRank procedure; if retrieval accuracy and reasoning scores on the five benchmarks no longer exceed the chunk-based baselines, the advantage of atomic units is falsified.

Figures

Figures reproduced from arXiv: 2604.20844 by Duanyang Yuan, Jian Huang, Ke Liang, Sihang Zhou, Siwei Wang, Xiaoshu Chen, Xinwang Liu, Yanning Hou.

**Figure 1.** Figure 1: Comparison of knowledge representation and indexing for three classes of methods. Native RAG uses coarse text chunks as basic storage units and indexes them via semantic similarity. GraphRAG organizes knowledge with triples or chunk-level nodes, building connections through relation edges to facilitate global indexing. The proposed Atom–Entity Graph instead represents the corpus with fine-grained knowledge… view at source ↗

**Figure 2.** Figure 2: Overview of AtomicRAG. During the preprocessing phase, we construct an unlabeled Atom–Entity Graph (AEG) that atomizes the corpus into minimal knowledge atoms linked via entities and co-occurrence relationships. Specifically, as illustrated in the figure, our co-occurrence relationships fall into three types: containment, relevance, and synonymy. At retrieval time, a complex query is optionally decomposed … view at source ↗

**Figure 3.** Figure 3: Semantic utility. LLM-based assessment of 1-hop graph neighborhoods on Graph-Bench (Medical) with respect to correctness, relevance, consistency, redundancy, and comprehensiveness. 4.4. Graph Quality Analysis (RQ3) We next examine the quality of the constructed graphs from both structural connectivity and semantic utility. Structural connectivity [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Impact of the retrieval Top-k hyperparameter on answer accuracy and token length: Top-k specifies how many knowledge atoms AtomicRAG retrieves per query, and token length is the total number of tokens in the LLM input formed by the question and the retrieved atoms [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy under limited context lengths : each point is evaluated with a fixed context budget, defined as the maximum number of tokens permitted in the LLM input, and all methods are truncated to this budget before generation. We evaluate efficiency in terms of (i) the accuracy–token trade-off as Top-k varies, (ii) robustness under fixed context budgets, and (iii) per-query retrieval latency. Effect of Top-… view at source ↗

**Figure 6.** Figure 6: Case study. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Prompt template for named entity recognition (NER). System Prompt: ``` Your task is to extract both RDF triples and knowledge fragments from the given passage in a single unified process. Requirements for RDF triples: - Each triple should contain at least one, but preferably two, of the named entities from the provided list - Follow the format: [subject, predicate, object - Clearly resolve pronouns to thei… view at source ↗

**Figure 8.** Figure 8: Prompt template for unified triple and knowledge atom extraction. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

**Figure 9.** Figure 9: Prompt template for question complexity scoring and atomic decomposition. 25 [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗

**Figure 10.** Figure 10: Prompt template for knowledge atom filtering. 26 [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗

**Figure 11.** Figure 11: Prompt template for abstract question answering. System Prompt: ``` As an advanced reading comprehension assistant, your task is to analyze text passages and corresponding questions meticulously. Your response start after "Thought: ", where you will methodically break down the reasoning process, illustrating how you arrive at conclusions. Conclude with "Answer: " to present a concise, definitive response,… view at source ↗

**Figure 12.** Figure 12: Prompt template for precise question answering. 27 [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

read the original abstract

Recent GraphRAG methods integrate graph structures into text indexing and retrieval, using knowledge graph triples to connect text chunks, thereby improving retrieval coverage and precision. However, we observe that treating text chunks as the basic unit of knowledge representation rigidly groups multiple atomic facts together, limiting the flexibility and adaptability needed to support diverse retrieval scenarios. Additionally, triple-based entity linking is sensitive to relation-extraction errors, which can lead to missing or incorrect reasoning paths and ultimately hurt retrieval accuracy. To address these issues, we propose the Atom-Entity Graph, a more precise and reliable architecture for knowledge representation and indexing. In our approach, knowledge is stored as knowledge atoms, namely individual, self-contained units of factual information, rather than coarse-grained text chunks. This allows knowledge elements to be flexibly reassembled without mutual interference, thereby enabling seamless alignment with diverse query perspectives. Edges between entities simply indicate whether a relationship exists. By combining personalized PageRank with relevance-based filtering, we maintain accurate entity connections and improve the reliability of reasoning. Theoretical analysis and experiments on five public benchmarks show that the proposed AtomicRAG algorithm outperforms strong RAG baselines in retrieval accuracy and reasoning robustness. Code: https://github.com/7HHHHH/AtomicRAG.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AtomicRAG swaps chunks and triples for atomic facts with existence edges plus PPR filtering, but the gains rest on unexamined extraction quality.

read the letter

AtomicRAG replaces text chunks and relation triples with individual knowledge atoms linked only by existence edges, then retrieves via personalized PageRank combined with relevance filtering. The paper claims this yields higher accuracy and robustness on five public benchmarks than prior GraphRAG baselines. The shift to atoms as the indexing unit is the clearest departure from the cited work, since it aims to let facts recombine freely for different queries without the bundling that chunks impose. Simplifying edges to mere existence also sidesteps some relation-extraction failures that can break paths in triple-based graphs. Those choices line up with the stated problems and produce a cleaner architecture on paper. The benchmark results are the main evidence offered, and the GitHub link suggests the implementation is available for inspection. The soft spot is the atom extraction step itself. The abstract gives no procedure, no error rates on completeness or qualifier loss, and no ablation showing whether filtering recovers what rigid chunking would have kept. If extraction (likely LLM-driven) drops context or adds noise, the existence-edge graph would amplify rather than fix the problem, turning the flexibility claim into an accuracy loss. Without those details or controls, the outperformance numbers are hard to interpret. This is for engineers and researchers tuning retrieval in production RAG systems who already know GraphRAG variants. A reader focused on incremental graph indexing changes could pull the architecture and the reported deltas, provided they verify the extraction pipeline. It deserves peer review because the core construction is testable and the empirical scope is concrete enough for referees to check.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes AtomicRAG, which represents knowledge as fine-grained, self-contained 'knowledge atoms' (individual factual units) linked by simple existence-only edges between entities, rather than coarse text chunks or relation triples. Retrieval combines personalized PageRank with relevance-based filtering to produce more flexible and accurate results. The central claim is that this architecture improves retrieval accuracy and reasoning robustness over strong RAG baselines, supported by theoretical analysis and experiments on five public benchmarks.

Significance. If the empirical gains and theoretical arguments hold under scrutiny, the work could meaningfully advance GraphRAG by reducing relation-extraction errors and chunk rigidity, enabling more adaptable retrieval for diverse queries. The public code release supports reproducibility and potential adoption.

major comments (3)

[§3] §3 (Method): The atom extraction procedure is described at a high level but lacks a concrete algorithm, prompt template, or validation metric for atom completeness. Without this, it is impossible to assess whether the claimed flexibility gains come at the cost of omitted qualifiers or merged facts, directly affecting the central accuracy claim.
[§4] §4 (Experiments): The results on the five benchmarks report outperformance but provide no details on baseline implementations, hyperparameter controls, statistical significance, or variance across runs. An ablation isolating the contribution of relevance filtering versus PPR is also missing, leaving open whether the gains are robust or artifactual.
[Theoretical Analysis] Theoretical Analysis (referenced in abstract and §5): The analysis is invoked to explain why existence edges plus PPR improve reasoning paths, yet no key lemmas, assumptions, or proof sketches appear. This weakens the ability to evaluate whether the architecture genuinely mitigates the triple-error and chunk-rigidity problems identified in the introduction.

minor comments (2)

[Abstract] Abstract: The phrase 'strong RAG baselines' should explicitly name the compared methods (e.g., standard GraphRAG, HippoRAG) to allow immediate context for the claimed gains.
[§3.2] Figure 2 or §3.2: The atom-entity graph visualization would benefit from an example showing how a multi-fact sentence is split into atoms and re-linked, to clarify context preservation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We appreciate the referee's thorough review and valuable suggestions. We agree with the points raised and will make the necessary revisions to enhance the manuscript's clarity, reproducibility, and rigor.

read point-by-point responses

Referee: [§3] §3 (Method): The atom extraction procedure is described at a high level but lacks a concrete algorithm, prompt template, or validation metric for atom completeness. Without this, it is impossible to assess whether the claimed flexibility gains come at the cost of omitted qualifiers or merged facts, directly affecting the central accuracy claim.

Authors: We agree that additional details are needed for the atom extraction procedure. In the revised version, we will provide the complete algorithm in pseudocode, the full prompt template used for extracting knowledge atoms from text, and a validation approach involving manual inspection of atom quality on sampled documents to ensure completeness and avoid merging or omitting facts. This will allow readers to better evaluate the flexibility gains. revision: yes
Referee: [§4] §4 (Experiments): The results on the five benchmarks report outperformance but provide no details on baseline implementations, hyperparameter controls, statistical significance, or variance across runs. An ablation isolating the contribution of relevance filtering versus PPR is also missing, leaving open whether the gains are robust or artifactual.

Authors: We acknowledge the lack of experimental details. The revised manuscript will include full specifications of baseline implementations (with references to their original papers and our re-implementations), all hyperparameter values and tuning procedures, statistical significance tests (e.g., paired t-tests with p-values), and standard deviations from multiple runs. Additionally, we will add an ablation study that isolates the effects of relevance filtering and personalized PageRank to demonstrate the contribution of each component. revision: yes
Referee: [Theoretical Analysis] Theoretical Analysis (referenced in abstract and §5): The analysis is invoked to explain why existence edges plus PPR improve reasoning paths, yet no key lemmas, assumptions, or proof sketches appear. This weakens the ability to evaluate whether the architecture genuinely mitigates the triple-error and chunk-rigidity problems identified in the introduction.

Authors: The theoretical analysis section will be expanded in the revision. We will include explicit assumptions (such as the independence of atomic facts and the connectivity properties of existence-only edges), a key lemma regarding reduced error propagation in reasoning paths compared to triple-based graphs, and a proof sketch demonstrating how personalized PageRank enhances path reliability. This will directly address how the architecture mitigates the issues of relation-extraction errors and chunk rigidity. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external benchmarks and independent theoretical analysis

full rationale

The paper defines AtomicRAG via atom-entity graphs with existence-only edges, personalized PageRank, and relevance filtering, then supports its superiority through theoretical analysis plus direct empirical comparison on five public benchmarks. No equations, fitted parameters, or predictions are presented that reduce by construction to the input data or to self-citations; the atom extraction and edge simplification steps are architectural choices whose performance is measured externally rather than assumed. Self-citations, if present, are not load-bearing for the central accuracy claim. The derivation chain is therefore self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the unstated premise that reliable extraction of atomic facts is feasible and that simple existence edges plus PageRank filtering suffice for robust reasoning paths; no free parameters or invented entities beyond the new graph structure are described in the abstract.

invented entities (1)

knowledge atom no independent evidence
purpose: self-contained unit of factual information for flexible reassembly
Introduced as the basic representation unit replacing text chunks

pith-pipeline@v0.9.0 · 5535 in / 1092 out tokens · 193488 ms · 2026-05-16T03:34:55.049416+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

knowledge is stored as knowledge atoms... Edges between entities simply indicate whether a relationship exists... personalized PageRank with relevance-based filtering

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 9 internal anchors

[1]

Agarwal, O. S. et al. gpt-oss-120b&gpt-oss-20b model card. 2025

work page 2025
[2]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Asai, A., Wu, Z., Wang, Y., Sil, A., and Hajishirzi, H. Self-rag: Learning to retrieve, generate, and critique through self-reflection. ArXiv, abs/2310.11511, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

You don't need pre-built graphs for rag: Retrieval augmented generation with adaptive reasoning structures

Chen, S., Zhou, C., Yuan, Z., Zhang, Q., Cui, Z., Chen, H., Xiao, Y., Cao, J., and Huang, X. You don't need pre-built graphs for rag: Retrieval augmented generation with adaptive reasoning structures. ArXiv, abs/2508.06105, 2025

work page arXiv 2025
[4]

Dense x retrieval: What retrieval granularity should we use? In Conference on Empirical Methods in Natural Language Processing, 2023

Chen, T., Wang, H., Chen, S., Yu, W., Ma, K., Zhao, X., Yu, D., and Zhang, H. Dense x retrieval: What retrieval granularity should we use? In Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[5]

Fastgraphrag: High-speed graph-based retrieval-augmented generation

CircleMind-AI. Fastgraphrag: High-speed graph-based retrieval-augmented generation. CircleMind-AI Blog, 2024

work page 2024
[6]

Darren Edge, Ha Trinh, J. L. Lazygraphrag: Setting a new standard for quality and cost. Microsoft Blog, 2024

work page 2024
[7]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A. N., Truitt, S., and Larson, J. From local to global: A graph rag approach to query-focused summarization. ArXiv, abs/2404.16130, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[8]

J., and Callan, J

Gao, L., Ma, X., Lin, J. J., and Callan, J. Precise zero-shot dense retrieval without relevance labels. In Annual Meeting of the Association for Computational Linguistics, 2022

work page 2022
[9]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Guo, Q., Wang, M., and Wang, H. Retrieval-augmented generation for large language models: A survey. ArXiv, abs/2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025 a

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Routerag: Efficient retrieval-augmented generation from text and graph via reinforcement learning

Guo, Y., Su, M., Guan, S., Sun, Z., Jin, X., Guo, J., and Cheng, X. Routerag: Efficient retrieval-augmented generation from text and graph via reinforcement learning. 2025 b

work page 2025
[12]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Guo, Z., Xia, L., Yu, Y., Ao, T., and Huang, C. Lightrag: Simple and fast retrieval-augmented generation. ArXiv, abs/2410.05779, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

J., Shu, Y., Gu, Y., Yasunaga, M., and Su, Y

Gutierrez, B. J., Shu, Y., Gu, Y., Yasunaga, M., and Su, Y. Hipporag: Neurobiologically inspired long-term memory for large language models. ArXiv, abs/2405.14831, 2024

work page arXiv 2024
[14]

J., Shu, Y., Qi, W., Zhou, S., and Su, Y

Guti'errez, B. J., Shu, Y., Qi, W., Zhou, S., and Su, Y. From rag to memory: Non-parametric continual learning for large language models. ArXiv, abs/2502.14802, 2025

work page arXiv 2025
[15]

Con- structing a multi-hop qa dataset for comprehensive evalu- ation of reasoning steps.ArXiv, abs/2011.01060,

Ho, X., Nguyen, A., Sugawara, S., and Aizawa, A. Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps. ArXiv, abs/2011.01060, 2020

work page arXiv 2011
[16]

Soft reasoning paths for knowledge graph completion

Hou, Y., Zhou, S., Liang, K., Meng, L., Chen, X., Xu, K., Wang, S., Liu, X., and Huang, J. Soft reasoning paths for knowledge graph completion. In International Joint Conference on Artificial Intelligence, 2025

work page 2025
[17]

Grag: Graph retrieval-augmented generation

Hu, Y., Lei, Z., Zhang, Z., Pan, B., Ling, C., and Zhao, L. Grag: Graph retrieval-augmented generation. ArXiv, abs/2405.16506, 2024

work page arXiv 2024
[18]

Ket-rag: A cost-efficient multi-granular indexing framework for graph-rag

Huang, Y., Zhang, S., and Xiao, X. Ket-rag: A cost-efficient multi-granular indexing framework for graph-rag. arXiv preprint arXiv:2502.09304, 2025

work page arXiv 2025
[19]

Leveraging passage retrieval with generative models for open domain question answering

Izacard, G. and Grave, E. Leveraging passage retrieval with generative models for open domain question answering. ArXiv, abs/2007.01282, 2020

work page arXiv 2007
[20]

Llmlingua: Compressing prompts for accelerated inference of large language models

Jiang, H., Wu, Q., Lin, C.-Y., Yang, Y., and Qiu, L. Llmlingua: Compressing prompts for accelerated inference of large language models. In Conference on Empirical Methods in Natural Language Processing, 2023

work page 2023
[21]

Dense Passage Retrieval for Open-Domain Question Answering

Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L. Y., Edunov, S., Chen, D., and tau Yih, W. Dense passage retrieval for open-domain question answering. ArXiv, abs/2004.04906, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2004
[22]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Kuttler, H., Lewis, M., tau Yih, W., Rockt \"a schel, T., Riedel, S., and Kiela, D. Retrieval-augmented generation for knowledge-intensive nlp tasks. ArXiv, abs/2005.11401, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005
[23]

Structrag: Boosting knowledge intensive reasoning of llms via inference-time hybrid information structurization

Li, Z., Chen, X., Yu, H., Lin, H., Lu, Y., Tang, Q., Huang, F., Han, X., Sun, L., and Li, Y. Structrag: Boosting knowledge intensive reasoning of llms via inference-time hybrid information structurization. ArXiv, abs/2410.08815, 2024

work page arXiv 2024
[24]

Hy- pergraphrag: Retrieval-augmented generation via hypergraph-structured knowledge representation.arXiv preprint arXiv:2503.21322,

Luo, H., Chen, G., Zheng, Y., Wu, X., Guo, Y., Lin, Q., Feng, Y., Kuang, Z., Song, M., Zhu, Y., et al. Hypergraphrag: Retrieval-augmented generation via hypergraph-structured knowledge representation. arXiv preprint arXiv:2503.21322, 2025 a

work page arXiv 2025
[25]

Luo, H., Haihong, E., Chen, G., Lin, Q., Guo, Y., Xu, F., min Kuang, Z., Song, M., Wu, X., Zhu, Y., and Luu, A. T. Graph-r1: Towards agentic graphrag framework via end-to-end reinforcement learning. ArXiv, abs/2507.21892, 2025 b

work page arXiv 2025
[26]

Gfm-rag: Graph foundation model for retrieval augmented generation

Luo, L., Zhao, Z., Haffari, G., Phung, D., Gong, C., and Pan, S. Gfm-rag: Graph foundation model for retrieval augmented generation. ArXiv, abs/2502.01113, 2025 c

work page arXiv 2025
[27]

and Karypis, G

Mavromatis, C. and Karypis, G. Gnn-rag: Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Annual Meeting of the Association for Computational Linguistics, 2025

work page 2025
[28]

Graph retrieval-augmented generation: A survey

Peng, B., Zhu, Y., Liu, Y., Bo, X., Shi, H., Hong, C., Zhang, Y., and Tang, S. Graph retrieval-augmented generation: A survey. ACM Transactions on Information Systems, 2024

work page 2024
[29]

Memorag: Moving to- wards next-gen rag via memory-inspired knowledge dis- covery.arXiv preprint arXiv:2409.05591,

Qian, H., Zhang, P., Liu, Z., Mao, K., and Dou, Z. Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery. arXiv preprint arXiv:2409.05591, 2024

work page arXiv 2024
[30]

Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. Raptor: Recursive abstractive processing for tree-organized retrieval. ArXiv, abs/2401.18059, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy

Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. ArXiv, abs/2305.15294, 2023

work page arXiv 2023
[32]

♫ musique: Multihop questions via single-hop question composition

Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. ♫ musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10: 0 539--554, 2021

work page 2021
[33]

Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions

Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. ArXiv, abs/2212.10509, 2022

work page arXiv 2022
[34]

Research on the construction and application of retrieval enhanced generation (rag) model based on knowledge graph

Wang, S., Yang, H., and Liu, W. Research on the construction and application of retrieval enhanced generation (rag) model based on knowledge graph. Scientific Reports, 15, 2025

work page 2025
[35]

A., Siu, A

Wang, Y., Lipka, N., Rossi, R. A., Siu, A. F., Zhang, R., and Derr, T. Knowledge graph prompting for multi-document question answering. In AAAI Conference on Artificial Intelligence, 2023

work page 2023
[36]

When to use graphs in rag: A comprehensive analysis for graph retrieval-augmented generation

Xiang, Z., Wu, C., Zhang, Q., Chen, S., Hong, Z., Huang, X., and Su, J. When to use graphs in rag: A comprehensive analysis for graph retrieval-augmented generation. ArXiv, abs/2506.05690, 2025

work page arXiv 2025
[37]

L., Iyer, S., Du, J., Lewis, P., Wang, W

Xiong, W., Li, X. L., Iyer, S., Du, J., Lewis, P., Wang, W. Y., Mehdad, Y., tau Yih, W., Riedel, S., Kiela, D., and Oğuz, B. Answering complex open-domain questions with multi-hop dense retrieval. ArXiv, abs/2009.12756, 2020

work page arXiv 2009
[38]

Qwen2.5 Technical Report

Yang, Q. A., Yang, B., Zhang, B., et al. Qwen2.5 technical report. ArXiv, abs/2412.15115, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[39]

W., Salakhutdinov, R., and Manning, C

Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., and Manning, C. D. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. 2018

work page 2018
[40]

A survey of graph retrieval-augmented generation for customized large language models

Zhang, Q., Chen, S., Bei, Y.-Q., Yuan, Z., Zhou, H., Hong, Z., Dong, J., Chen, H., Chang, Y., and Huang, X. A survey of graph retrieval-augmented generation for customized large language models. ArXiv, abs/2501.13958, 2025

work page arXiv 2025
[41]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page