pith. machine review for the scientific record. sign in

arxiv: 2404.16130 · v2 · submitted 2024-04-24 · 💻 cs.CL · cs.AI· cs.IR

Recognition: no theorem link

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Alex Chao, Apurva Mody, Darren Edge, Dasha Metropolitansky, Ha Trinh, Jonathan Larson, Joshua Bradley, Newman Cheng, Robert Osazuwa Ness, Steven Truitt

Authors on Pith no claims yet

Pith reviewed 2026-05-11 05:07 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IR
keywords GraphRAGRetrieval-Augmented GenerationQuery-Focused SummarizationEntity Knowledge GraphCommunity SummariesGlobal Question AnsweringLarge Language Models
0
0 comments X

The pith

GraphRAG builds entity knowledge graphs and community summaries to answer global questions over large private text collections more comprehensively than standard RAG.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GraphRAG to address the limitation that standard retrieval-augmented generation fails on broad, corpus-wide questions which are really query-focused summarization tasks. It uses an LLM in two stages to first extract an entity knowledge graph from the source documents and then generate summaries for communities of closely related entities. When a question arrives, the system produces a partial response from each community summary and then combines those into a single final answer. The authors test this on global sensemaking questions over datasets in the one-million-token range and report substantial gains in both comprehensiveness and diversity of answers compared with a conventional RAG baseline.

Core claim

GraphRAG constructs a graph index by deriving an entity knowledge graph from the source documents and then pregenerating community summaries for all groups of closely related entities; given a question, each community summary generates a partial response and all partial responses are summarized into a final answer, yielding substantial improvements over a conventional RAG baseline in both comprehensiveness and diversity for global sensemaking questions over datasets in the 1 million token range.

What carries the argument

Two-stage LLM-based graph indexing that first builds an entity knowledge graph and then pregenerates community summaries for groups of related entities, which are used to create and aggregate partial responses.

If this is right

  • GraphRAG can answer questions that require understanding an entire document collection rather than isolated passages.
  • The method scales query-focused summarization to the same quantities of text handled by typical RAG systems.
  • Partial responses from community summaries can be synthesized into final answers that improve both breadth and variety over direct retrieval.
  • The two-stage indexing allows the system to handle both narrow retrieval questions and broad sensemaking questions within one framework.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be tested on domain-specific corpora such as legal contracts or scientific papers where global pattern detection is valuable.
  • If community detection quality varies, the method might benefit from iterative refinement of the graph index based on question type.
  • Hybrid systems could route local questions to standard RAG and global questions to GraphRAG without changing the underlying LLM.

Load-bearing premise

LLM-generated entity graphs and community summaries accurately and comprehensively capture the source material without introducing errors, omissions, or biases that undermine the final combined responses.

What would settle it

A human evaluation on a corpus with independently verified global themes in which GraphRAG answers show no measurable gain in comprehensiveness or diversity, or in which the community summaries omit or distort major themes present in the raw text.

read the original abstract

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that GraphRAG leads to substantial improvements over a conventional RAG baseline for both the comprehensiveness and diversity of generated answers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes GraphRAG, a two-stage LLM-driven indexing method that first extracts an entity knowledge graph from source documents and then generates community summaries over related entity groups. For global sensemaking queries, it produces partial answers from each community summary and applies a final map-reduce summarization step. The central empirical claim is that this yields substantial gains in answer comprehensiveness and diversity relative to a conventional RAG baseline on corpora of approximately 1 million tokens.

Significance. If the reported gains prove robust under detailed evaluation, the work would meaningfully advance RAG systems by addressing their documented weakness on global queries through graph-based indexing and hierarchical summarization. The approach is an empirical engineering contribution that combines existing ideas in a scalable way for private corpora.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): the central claim of 'substantial improvements' in comprehensiveness and diversity is stated without any quantitative results, exact metric definitions, dataset descriptions, baseline implementation details, or statistical significance tests. This information is load-bearing for assessing whether the gains arise from the graph structure rather than additional LLM calls.
  2. [§3] §3 (Method): the two-stage indexing (entity KG construction followed by community summarization) is presented without any human validation, inter-annotator agreement scores, or ablation against oracle graphs. Because downstream partial responses and the final summary are also LLM-generated, systematic extraction errors or omissions would propagate directly into the reported gains, yet no such checks are described.
minor comments (1)
  1. [§3.3] The description of how community summaries are combined in the final response step could be clarified with a short pseudocode or diagram to make the map-reduce flow explicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claim of 'substantial improvements' in comprehensiveness and diversity is stated without any quantitative results, exact metric definitions, dataset descriptions, baseline implementation details, or statistical significance tests. This information is load-bearing for assessing whether the gains arise from the graph structure rather than additional LLM calls.

    Authors: We agree that the abstract and §4 would be strengthened by explicit quantitative details. In the revised manuscript we will update the abstract to reference key quantitative findings from the experiments and expand §4 to provide exact metric definitions (human Likert-scale ratings for comprehensiveness and diversity), dataset descriptions, baseline implementation specifics, and statistical significance results. We will also add analysis that isolates the contribution of the graph indexing from the total number of LLM calls. revision: yes

  2. Referee: [§3] §3 (Method): the two-stage indexing (entity KG construction followed by community summarization) is presented without any human validation, inter-annotator agreement scores, or ablation against oracle graphs. Because downstream partial responses and the final summary are also LLM-generated, systematic extraction errors or omissions would propagate directly into the reported gains, yet no such checks are described.

    Authors: We acknowledge the value of validating the intermediate indexing steps. We will revise §3 to discuss potential error propagation from LLM-based entity and community extraction and include any available internal checks or related evidence. A full-scale human validation or oracle-graph ablation is resource-intensive at the corpus scale, but we will add a limitation statement and, where feasible, a small-scale comparison to better contextualize the results. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical engineering contribution with independent evaluation

full rationale

The paper proposes GraphRAG as a two-stage LLM-based indexing method (entity KG construction followed by community summarization) for global query-focused summarization, then reports empirical gains in comprehensiveness and diversity over a standard RAG baseline on 1M-token datasets. No equations, first-principles derivations, fitted parameters, or predictions appear in the abstract or described method. The central claim is an empirical comparison rather than a reduction to inputs by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are referenced. The evaluation metrics and baseline are external to the indexing process itself, making the derivation chain self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

No free parameters or invented entities are introduced; the approach rests on standard assumptions about LLM extraction capabilities and graph community structure.

axioms (2)
  • domain assumption Large language models can extract entities and relations from source text to form a usable knowledge graph.
    Invoked in the first stage of index construction.
  • domain assumption Communities of related entities identified via graph algorithms yield summaries that collectively support global question answering.
    Invoked in the second stage and response generation.

pith-pipeline@v0.9.0 · 5579 in / 1270 out tokens · 34713 ms · 2026-05-11T05:07:17.015621+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

    cs.AI 2026-05 conditional novelty 8.0

    MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for ...

  2. ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

    cs.CR 2026-05 unverdicted novelty 8.0

    ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.

  3. Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

    cs.CR 2026-05 unverdicted novelty 8.0

    Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...

  4. Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

    cs.AI 2026-05 unverdicted novelty 7.0

    PyRAG turns multi-hop reasoning into executable Python code over retrieval tools for explicit, verifiable step-by-step RAG.

  5. MEME: Multi-entity & Evolving Memory Evaluation

    cs.LG 2026-05 unverdicted novelty 7.0

    All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.

  6. Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

    cs.AI 2026-05 unverdicted novelty 7.0

    Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.

  7. MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

    cs.LG 2026-05 unverdicted novelty 7.0

    MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

  8. DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning

    cs.CL 2026-05 unverdicted novelty 7.0

    DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.

  9. MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

    cs.AI 2026-05 unverdicted novelty 7.0

    MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.

  10. SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards

    eess.SP 2026-05 unverdicted novelty 7.0

    SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compar...

  11. When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

    cs.AI 2026-05 unverdicted novelty 7.0

    A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

  12. The Context Gathering Decision Process: A POMDP Framework for Agentic Search

    cs.AI 2026-05 accept novelty 7.0

    Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no perf...

  13. MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

    cs.CL 2026-05 unverdicted novelty 7.0

    MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.

  14. SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

    cs.CL 2026-05 unverdicted novelty 7.0

    SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

  15. MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

    cs.MA 2026-05 unverdicted novelty 7.0

    MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

  16. Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

    cs.CL 2026-05 unverdicted novelty 7.0

    MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

  17. XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

    cs.AI 2026-04 unverdicted novelty 7.0

    XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

  18. Skill Retrieval Augmentation for Agentic AI

    cs.CL 2026-04 unverdicted novelty 7.0

    Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.

  19. A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

    cs.AI 2026-04 unverdicted novelty 7.0

    A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.

  20. Structure Guided Retrieval-Augmented Generation for Factual Queries

    cs.IR 2026-04 unverdicted novelty 7.0

    SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.

  21. ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

    cs.CL 2026-04 unverdicted novelty 7.0

    ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.

  22. STRIDE: Strategic Iterative Decision-Making for Retrieval-Augmented Multi-Hop Question Answering

    cs.AI 2026-04 unverdicted novelty 7.0

    STRIDE uses a meta-planner for entity-agnostic reasoning skeletons and a supervisor for dependency-aware execution to improve retrieval-augmented multi-hop QA.

  23. SAGER: Self-Evolving User Policy Skills for Recommendation Agent

    cs.IR 2026-04 unverdicted novelty 7.0

    SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to mem...

  24. ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback

    cs.AI 2026-04 unverdicted novelty 7.0

    ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.

  25. MisEdu-RAG: A Misconception-Aware Dual-Hypergraph RAG for Novice Math Teachers

    cs.IR 2026-04 unverdicted novelty 7.0

    MisEdu-RAG builds concept and instance hypergraphs for two-stage retrieval of pedagogical knowledge and student errors, improving feedback quality on the MisstepMath benchmark by 10.95% token-F1 and up to 15.3% on res...

  26. AnnoRetrieve: Efficient Structured Retrieval for Unstructured Document Analysis

    cs.IR 2026-04 unverdicted novelty 7.0

    AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.

  27. Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems

    cs.IR 2026-04 unverdicted novelty 7.0

    Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.

  28. Cognifold: Always-On Proactive Memory via Cognitive Folding

    cs.AI 2026-05 unverdicted novelty 6.0

    Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-or...

  29. IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation

    cs.AI 2026-05 unverdicted novelty 6.0

    IdeaForge combines multiple innovation methodologies through specialist agents on a persistent knowledge graph, using cross-methodology convergent claim linkages to rank and draft patent claims with higher traceabilit...

  30. PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents

    cs.CL 2026-05 unverdicted novelty 6.0

    PRISM achieves higher accuracy than baselines on long-horizon agent tasks at an order-of-magnitude smaller context budget by combining hierarchical bundle search, query-sensitive costing, evidence compression, and ada...

  31. SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory

    cs.AI 2026-05 unverdicted novelty 6.0

    SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...

  32. SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs

    cs.CL 2026-05 unverdicted novelty 6.0

    SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.

  33. Leveraging RAG for Training-Free Alignment of LLMs

    cs.LG 2026-05 unverdicted novelty 6.0

    RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with o...

  34. ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

    cs.CL 2026-05 conditional novelty 6.0

    Intent-aware retrieval over assertion-labeled knowledge graphs improves clinical QA accuracy by 22 percentage points on a new MIMIC-IV benchmark that stresses negation, temporality, and attribution.

  35. ASTRA-QA: A Benchmark for Abstract Question Answering over Documents

    cs.CL 2026-05 unverdicted novelty 6.0

    ASTRA-QA is a benchmark for abstract document question answering that uses explicit topic sets, unsupported content annotations, and evidence alignments to enable direct scoring of coverage and hallucination.

  36. SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution

    cs.CL 2026-05 unverdicted novelty 6.0

    SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.

  37. HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

    cs.AI 2026-05 unverdicted novelty 6.0

    HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.

  38. Generating Leakage-Free Benchmarks for Robust RAG Evaluation

    cs.CL 2026-05 unverdicted novelty 6.0

    SeedRG generates novel, leakage-free RAG benchmark examples from seed data by mapping reasoning structures and swapping entities while applying consistency and leakage checks.

  39. LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation

    cs.IR 2026-05 unverdicted novelty 6.0

    LARAG improves RAG answer quality on hyperlinked technical documentation by using author-defined links for retrieval, achieving higher BERTScore while using fewer chunks and tokens than standard embedding-based RAG.

  40. Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings

    cs.IR 2026-05 unverdicted novelty 6.0

    Embeddings retrieve same-subfield papers at 45-52% but same-agenda papers at only 15-21%; citation rerank reaches 57-59% on agenda queries.

  41. Query-efficient model evaluation using cached responses

    cs.LG 2026-05 unverdicted novelty 6.0

    DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.

  42. WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems

    cs.CL 2026-05 conditional novelty 6.0

    WiCER iteratively diagnoses and repairs fact loss during wiki compilation for LLMs, recovering 80% of quality lost in blind distillation across 17 domains while cutting catastrophic failures by 55%.

  43. Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries

    cs.CL 2026-05 unverdicted novelty 6.0

    GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.

  44. ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting

    cs.AI 2026-05 unverdicted novelty 6.0

    ScrapMem introduces optical forgetting to compress multimodal memories for LLM agents on edge devices, cutting storage by up to 93% while reaching 51.0% Joint@10 and 70.3% Recall@10 on ATM-Bench.

  45. CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification

    cs.CL 2026-05 unverdicted novelty 6.0

    CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tune...

  46. Retrieval and Multi-Hop Reasoning in 1M-Token Context Windows: Evaluating LLMs on Classical Chinese Text

    cs.AI 2026-05 unverdicted novelty 6.0

    Frontier LLMs solve single-needle retrieval at 1M tokens on classical Chinese but show three distinct accuracy-decay patterns in three-hop reasoning between 256K and 1M tokens.

  47. Enhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimization

    cs.CL 2026-05 unverdicted novelty 6.0

    Judge-R1 improves LLM judgment document generation by combining agentic legal information retrieval with GRPO-based rubric-guided optimization, outperforming baselines on the JuDGE benchmark.

  48. From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction

    cs.AI 2026-04 unverdicted novelty 6.0

    Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.

  49. ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era

    cs.AI 2026-04 unverdicted novelty 6.0

    ObjectGraph is a Markdown superset file format that represents documents as traversable knowledge graphs, achieving up to 95.3% token reduction for agents with no significant accuracy loss.

  50. Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

    cs.AI 2026-04 unverdicted novelty 6.0

    Grounding LLMs via node-wise anchors in a traffic scenario taxonomy improves law-scenario matching by 29.1% and derived requirement accuracy by 36.9-38.2% on Chinese laws and 5,897 scenarios, enabling a compliance lay...

  51. Adaptive Defense Orchestration for RAG: A Sentinel-Strategist Architecture against Multi-Vector Attacks

    cs.CR 2026-04 unverdicted novelty 6.0

    A context-aware Sentinel-Strategist system for RAG selectively applies defenses to block membership inference and data poisoning while recovering most retrieval utility compared to always-on defense stacks.

  52. To Know is to Construct: Schema-Constrained Generation for Agent Memory

    cs.CL 2026-04 unverdicted novelty 6.0

    SCG-MEM reformulates agent memory access as schema-constrained generation within dynamic cognitive schemas, using assimilation and accommodation for updates plus an associative graph for reasoning, and outperforms ret...

  53. GraphRAG-IRL: Personalized Recommendation with Graph-Grounded Inverse Reinforcement Learning and LLM Re-ranking

    cs.IR 2026-04 unverdicted novelty 6.0

    GraphRAG-IRL fuses graph-grounded MaxEnt IRL pre-ranking with persona-guided LLM re-ranking to deliver up to 16.8% NDCG@10 gains over IRL-only baselines on MovieLens and consistent 4-6% gains on KuaiRand.

  54. DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

    cs.AI 2026-04 unverdicted novelty 6.0

    DW-Bench shows tool-augmented LLMs outperform static ones on data warehouse graph reasoning but plateau on hard compositional question subtypes.

  55. EHRAG: Bridging Semantic Gaps in Lightweight GraphRAG via Hybrid Hypergraph Construction and Retrieval

    cs.AI 2026-04 unverdicted novelty 6.0

    EHRAG constructs structural hyperedges from sentence co-occurrence and semantic hyperedges from entity embedding clusters, then applies hybrid diffusion plus topic-aware PPR to retrieve top-k documents, outperforming ...

  56. MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search

    cs.IR 2026-04 unverdicted novelty 6.0

    MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.

  57. MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search

    cs.IR 2026-04 unverdicted novelty 6.0

    MemSearch-o1 uses reasoning-aligned memory growth from seed tokens, retracing via contribution functions, and path reorganization to mitigate memory dilution in LLM agentic search.

  58. EvoRAG: Making Knowledge Graph-based RAG Automatically Evolve through Feedback-driven Backpropagation

    cs.DB 2026-04 unverdicted novelty 6.0

    EvoRAG adds a feedback-driven backpropagation step that attributes response quality to individual knowledge-graph triplets and updates the graph to raise reasoning accuracy by 7.34 percent over prior KG-RAG methods.

  59. Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs

    cs.CL 2026-04 unverdicted novelty 6.0

    RALP learns string-based chain-of-thought prompts as scoring functions for knowledge graph triples using Bayesian optimization from fewer than 30 examples, improving link prediction MRR by over 5% and achieving over 8...

  60. Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs

    cs.CL 2026-04 unverdicted novelty 6.0

    Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · cited by 104 Pith papers · 7 internal anchors

  1. [1]

    GPT-4 Technical Report

    Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774

  2. [2]

    Gemini: A Family of Highly Capable Multimodal Models

    Anil, R., Borgeaud, S., Wu, Y., Alayrac, J.-B., Yu, J., Soricut, R., Schalkwyk, J., Dai, A. M., Hauth, A., et al. (2023). Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805

  3. [3]

    arXiv preprint arXiv:2306.04136 , year=

    Baek, J., Aji, A. F., and Saffari, A. (2023). Knowledge-augmented language model prompting for zero-shot knowledge graph question answering. arXiv preprint arXiv:2306.04136

  4. [4]

    Ban, T., Chen, L., Wang, X., and Chen, H. (2023). From query tools to causal architects: Harnessing large language models for advanced causal discovery from data

  5. [5]

    and Gulla, J

    Barlaug, N. and Gulla, J. A. (2021). Neural networks for entity matching: A survey. ACM Transactions on Knowledge Discovery from Data (TKDD) , 15(3):1--37

  6. [6]

    Baumel, T., Eyal, M., and Elhadad, M. (2018). Query focused abstractive summarization: Incorporating query relevance, multi-document coverage, and summary length constraints into seq2seq models. arXiv preprint arXiv:1801.07704

  7. [7]

    D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E

    Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment , 2008(10):P10008

  8. [8]

    D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems , 33:1877--1901

  9. [9]

    Cheng, X., Luo, D., Chen, X., Liu, L., Zhao, D., and Yan, R. (2024). Lift yourself up: Retrieval-augmented text generation with self-memory. Advances in Neural Information Processing Systems , 36

  10. [10]

    and Christen, P

    Christen, P. and Christen, P. (2012). The data matching process . Springer

  11. [11]

    D., Bridgeford, E

    Chung, J., Pedigo, B. D., Bridgeford, E. W., Varjavand, B. K., Helm, H. S., and Vogelstein, J. T. (2019). Graspy: Graph statistics in python. Journal of Machine Learning Research , 20(158):1--7

  12. [12]

    Dang, H. T. (2006). Duc 2005: Evaluation of question-focused summarization systems. In Proceedings of the Workshop on Task-Focused Summarization and Question Answering , pages 48--55

  13. [13]

    K., Ipeirotis, P

    Elmagarmid, A. K., Ipeirotis, P. G., and Verykios, V. S. (2006). Duplicate record detection: A survey. IEEE Transactions on knowledge and data engineering , 19(1):1--16

  14. [14]

    Es, S., James, J., Espinosa-Anke, L., and Schockaert, S. (2023). Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217

  15. [15]

    S., and Yates, A

    Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D. S., and Yates, A. (2004). Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web , WWW '04, page 100–110, New York, NY, USA. Association for Computing Machinery

  16. [16]

    Feng, Z., Feng, X., Zhao, D., Yang, M., and Qin, B. (2023). Retrieval-generation synergy augmented large language models. arXiv preprint arXiv:2310.05149

  17. [17]

    Fortunato, S. (2010). Community detection in graphs. Physics reports , 486(3-5):75--174

  18. [18]

    Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., and Wang, H. (2023). Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997

  19. [19]

    Chawla, Thomas Laurent, Yann LeCun, Xavier Bresson, and Bryan Hooi

    He, X., Tian, Y., Sun, Y., Chawla, N. V., Laurent, T., LeCun, Y., Bresson, X., and Hooi, B. (2024). G-retriever: Retrieval-augmented generation for textual graph understanding and question answering. arXiv preprint arXiv:2402.07630

  20. [20]

    Large Language Models Cannot Self-Correct Reasoning Yet

    Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. (2023). Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798

  21. [21]

    Jacomy, M., Venturini, T., Heymann, S., and Bastian, M. (2014). Forceatlas2, a continuous graph layout algorithm for handy network visualization designed for the gephi software. PLoS ONE 9(6): e98679. https://doi.org/10.1371/journal.pone.0098679

  22. [22]

    Y., and Zhang, W

    Jin, D., Yu, Z., Jiao, P., Pan, S., He, D., Wu, J., Philip, S. Y., and Zhang, W. (2021). A survey of community detection approaches: From statistical modeling to deep learning. IEEE Transactions on Knowledge and Data Engineering , 35(2):1149--1170

  23. [23]

    M., Baek, J., and Hwang, S

    Kang, M., Kwak, J. M., Baek, J., and Hwang, S. J. (2023). Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846

  24. [24]

    Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp.arXiv preprint arXiv:2212.14024, 2022

    Khattab, O., Santhanam, K., Li, X. L., Hall, D., Liang, P., Potts, C., and Zaharia, M. (2022). Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024

  25. [25]

    Kim, D., Xie, L., and Ong, C. S. (2016). Probabilistic knowledge graph construction: Compositional and incremental approaches. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management , CIKM '16, page 2257–2262, New York, NY, USA. Association for Computing Machinery

  26. [26]

    Kim, G., Kim, S., Jeon, B., Park, J., and Kang, J. (2023). Tree of clarifications: Answering ambiguous questions with retrieval-augmented large language models. arXiv preprint arXiv:2310.14696

  27. [27]

    Klein, G., Moon, B., and Hoffman, R. R. (2006). Making sense of sensemaking 1: Alternative perspectives. IEEE intelligent systems , 21(4):70--73

  28. [28]

    Kosinski, M. (2024). Evaluating large language models in theory of mind tasks. Proceedings of the National Academy of Sciences , 121(45):e2405460121

  29. [29]

    Kuratov, Y., Bulatov, A., Anokhin, P., Sorokin, D., Sorokin, A., and Burtsev, M. (2024). In search of needles in a 11m haystack: Recurrent memory finds what llms miss

  30. [30]

    Langchain graphs

    LangChain (2024). Langchain graphs. https://langchain-graphrag.readthedocs.io/en/latest/

  31. [31]

    Laskar, M. T. R., Hoque, E., and Huang, J. (2020). Query focused abstractive summarization via incorporating query relevance and transfer learning with transformer models. In Advances in Artificial Intelligence: 33rd Canadian Conference on Artificial Intelligence, Canadian AI 2020, Ottawa, ON, Canada, May 13--15, 2020, Proceedings 33 , pages 342--348. Springer

  32. [32]

    u ttler, H., Lewis, M., Yih, W.-t., Rockt \

    Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K \"u ttler, H., Lewis, M., Yih, W.-t., Rockt \"a schel, T., et al. (2020). Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems , 33:9459--9474

  33. [33]

    Available: https://doi.org/10.1162/tacl a 00449

    Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., and Liang, P. (2023). Lost in the middle: How language models use long contexts. arXiv:2307.03172

  34. [34]

    GraphRAG Implementation with LlamaIndex - V2

    LlamaIndex (2024). GraphRAG Implementation with LlamaIndex - V2 . https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/cookbooks/GraphRAG_v2.ipynb

  35. [35]

    Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. (2024). Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems , 36

  36. [36]

    Manakul, P., Liusie, A., and Gales, M. J. (2023). Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896

  37. [37]

    Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., and Chen, W. (2020). Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553

  38. [38]

    M., Klavans, R., and Boyack, K

    Martin, S., Brown, W. M., Klavans, R., and Boyack, K. (2011). Openord: An open-source toolbox for large graph layout. SPIE Conference on Visualization and Data Analysis (VDA)

  39. [39]

    Melnyk, I., Dognin, P., and Das, P. (2022). Knowledge graph generation from text

  40. [40]

    and Larson, J

    Metropolitansky, D. and Larson, J. (2025). Towards effective extraction and evaluation of factual claims

  41. [41]

    The impact of large language models on scientific discovery: a preliminary study using gpt-4

    Microsoft (2023). The impact of large language models on scientific discovery: a preliminary study using gpt-4

  42. [42]

    Mooney, R. J. and Bunescu, R. (2005). Mining knowledge from text using information extraction. SIGKDD Explor. Newsl. , 7(1):3–10

  43. [43]

    Nebulagraph launches industry-first graph rag: Retrieval-augmented generation with llm based on knowledge graphs

    NebulaGraph (2024). Nebulagraph launches industry-first graph rag: Retrieval-augmented generation with llm based on knowledge graphs. https://www.nebula-graph.io/posts/graph-RAG

  44. [44]

    Get started with graphrag: Neo4j’s ecosystem tools

    Neo4J (2024). Get started with graphrag: Neo4j’s ecosystem tools. https://neo4j.com/developer-blog/graphrag-ecosystem-tools/

  45. [45]

    Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences , 103(23):8577--8582

  46. [46]

    Ni, J., Shi, M., Stammbach, D., Sachan, M., Ash, E., and Leippold, M. (2024). AF a CTA : Assisting the annotation of factual claim detection with reliable LLM annotators. In Ku, L.-W., Martins, A., and Srikumar, V., editors, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages 1890--1912, ...

  47. [47]

    Chatgpt: Gpt-4 language model

    OpenAI (2023). Chatgpt: Gpt-4 language model

  48. [48]

    and He, H

    Padmakumar, V. and He, H. (2024). Does writing with language models reduce content diversity? ICLR

  49. [49]

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research , 12:2825--2830

  50. [50]

    Ram, O., Levine, Y., Dalmedigos, I., Muhlgay, D., Shashua, A., Leyton-Brown, K., and Shoham, Y. (2023). In-context retrieval-augmented language models. Transactions of the Association for Computational Linguistics , 11:1316--1331

  51. [51]

    and Joshi, A

    Ranade, P. and Joshi, A. (2023). Fabula: Intelligence report generation using retrieval-augmented narrative construction. arXiv preprint arXiv:2310.13848

  52. [52]

    Salminen, J., Liu, C., Pian, W., Chi, J., H \"a yh \"a nen, E., and Jansen, B. J. (2024). Deus ex machina and personas from large language models: Investigating the composition of ai-generated persona descriptions. In Proceedings of the CHI Conference on Human Factors in Computing Systems , pages 1--20

  53. [53]

    Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. (2024). Raptor: Recursive abstractive processing for tree-organized retrieval. arXiv preprint arXiv:2401.18059

  54. [54]

    Scott, K. (2024). Behind the Tech . https://www.microsoft.com/en-us/behind-the-tech

  55. [55]

    Shao, Z., Gong, Y., Shen, Y., Huang, M., Duan, N., and Chen, W. (2023). Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. arXiv preprint arXiv:2305.15294

  56. [56]

    A., Rey, B

    Shin, J., Hedderich, M. A., Rey, B. J., Lucero, A., and Oulasvirta, A. (2024). Understanding human-ai workflows for generating personas. In Proceedings of the 2024 ACM Designing Interactive Systems Conference , pages 757--781

  57. [57]

    Shinn, N., Cassano, F., Gopinath, A., Narasimhan, K., and Yao, S. (2024). Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems , 36

  58. [58]

    B., Barezi, E

    Su, D., Xu, Y., Yu, T., Siddique, F. B., Barezi, E. J., and Fung, P. (2020). Caire-covid: A question answering and query-focused multi-document summarization system for covid-19 scholarly information management. arXiv preprint arXiv:2005.03975

  59. [59]

    Tan, Z., Zhao, X., and Wang, W. (2017). Representation learning of large-scale knowledge graphs via entity feature combinations. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management , CIKM '17, page 1777–1786, New York, NY, USA. Association for Computing Machinery

  60. [60]

    Tang and Y

    Tang, Y. and Yang, Y. (2024). MultiHop-RAG : Benchmarking retrieval-augmented generation for multi-hop queries. arXiv preprint arXiv:2401.15391

  61. [61]

    Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288

  62. [62]

    A., Waltman, L., and Van Eck, N

    Traag, V. A., Waltman, L., and Van Eck, N. J. (2019). From L ouvain to L eiden: guaranteeing well-connected communities. Scientific Reports , 9(1)

  63. [63]

    Trajanoska, M., Stojanov, R., and Trajanov, D. (2023). Enhancing knowledge graph construction using large language models. ArXiv , abs/2305.04676

  64. [64]

    Trivedi, H., Balasubramanian, N., Khot, T., and Sabharwal, A. (2022). Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. arXiv preprint arXiv:2212.10509

  65. [65]

    Wang, J., Liang, Y., Meng, F., Sun, Z., Shi, H., Li, Z., Xu, J., Qu, J., and Zhou, J. (2023a). Is chatgpt a good nlg evaluator? a preliminary study. arXiv preprint arXiv:2303.04048

  66. [66]

    Wang, S., Khramtsova, E., Zhuang, S., and Zuccon, G. (2024). Feb4rag: Evaluating federated search in the context of retrieval augmented generation. arXiv preprint arXiv:2402.11891

  67. [67]

    Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. (2022). Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171

  68. [68]

    A., Siu, A., Zhang, R., and Derr, T

    Wang, Y., Lipka, N., Rossi, R. A., Siu, A., Zhang, R., and Derr, T. (2023b). Knowledge graph prompting for multi-document question answering

  69. [69]

    and Lapata, M

    Xu, Y. and Lapata, M. (2021). Text summarization with latent queries. arXiv preprint arXiv:2106.00104

  70. [70]

    W., Salakhutdinov, R., and Manning, C

    Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., and Manning, C. D. (2018). HotpotQA : A dataset for diverse, explainable multi-hop question answering. In Conference on Empirical Methods in Natural Language Processing ( EMNLP )

  71. [71]

    Yao, J.-g., Wan, X., and Xiao, J. (2017). Recent advances in document summarization. Knowledge and Information Systems , 53:297--336

  72. [72]

    Yao, L., Peng, J., Mao, C., and Luo, Y. (2023). Exploring large language models for knowledge graph completion

  73. [73]

    Yates, A., Banko, M., Broadhead, M., Cafarella, M., Etzioni, O., and Soderland, S. (2007). T ext R unner: Open information extraction on the web. In Carpenter, B., Stent, A., and Williams, J. D., editors, Proceedings of Human Language Technologies: The Annual Conference of the North A merican Chapter of the Association for Computational Linguistics ( NAAC...

  74. [74]

    Yuan, X., Li, J., Wang, D., Chen, Y., Mao, X., Huang, L., Xue, H., Wang, W., Ren, K., and Wang, J. (2024). S-eval: Automatic and adaptive test generation for benchmarking safety evaluation of large language models. arXiv preprint arXiv:2405.14191

  75. [75]

    Zhang, J. (2023). Graph-toolformer: To empower llms with graph reasoning ability via prompt augmented by chatgpt. arXiv preprint arXiv:2304.11116

  76. [76]

    Zhang, Y., Zhang, Y., Gan, Y., Yao, L., and Wang, C. (2024a). Causal graph discovery with retrieval-augmented generation based large language models. arXiv preprint arXiv:2402.15301

  77. [77]

    Zhang, Z., Chen, J., and Yang, D. (2024b). Darg: Dynamic evaluation of large language models via adaptive reasoning graph. arXiv preprint arXiv:2406.17271

  78. [78]

    Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. (2024). Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems , 36

  79. [79]

    Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., and Zhang, N. (2024). Llms for knowledge graph construction and reasoning: Recent capabilities and future opportunities