pith. sign in

super hub Canonical reference

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Canonical reference. 80% of citing Pith papers cite this work as background.

281 Pith papers citing it
Background 80% of classified citations
abstract

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that GraphRAG leads to substantial improvements over a conventional RAG baseline for both the comprehensiveness and diversity of generated answers.

hub tools

citation-role summary

background 40 baseline 5 method 3 dataset 1

citation-polarity summary

claims ledger

  • abstract The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these

authors

co-cited works

clear filters

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

ContextNest: Verifiable Context Governance for Autonomous AI Agent

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

ContextNest formalizes context governance for AI agents using hash-chained documents and deterministic selectors, with experiments showing higher answer quality and perfect determinism versus standard retrieval.

Grounding LLM Reasoning under Incomplete Graph Evidence

cs.CL · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

Develops a theoretical perspective showing no hard rule can perfectly reject false unsupported trajectories while retaining true-but-unobserved ones under incomplete graph evidence, and characterizes soft grounding as KL-regularized deformation of the LLM prior.

Self-Augmenting Retrieval for Diffusion Language Models

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

SARDI uses lookahead tokens from low-confidence predictions in discrete diffusion language models to dynamically guide retrieval during denoising, outperforming training-free baselines on five multi-hop QA benchmarks at up to 8x higher throughput.

LifeSide: Benchmarking Agents as Lifelong Digital Companions

cs.CL · 2026-06-03 · unverdicted · novelty 7.0

LifeSide is a new benchmark that evaluates AI agents on multi-session Memory-Emotion-Environment loops via simulated user profiles and event trajectories, revealing that models saturating existing memory tests fail at long-horizon user understanding.

HyperPatch: Sequential Knowledge Editing Under n-ary Structural Drift

cs.CL · 2026-06-02 · unverdicted · novelty 7.0

HyperPatch reformulates sequential n-ary knowledge editing as hypergraph manifold stability, using HGNN initialization, SimHash alignment plus Topological LoRA, and fused reasoning to achieve large H-Acc gains on MQuAKE benchmarks.

RWGBench: Evaluating Scholarly Positioning in Related Work Generation

cs.DL · 2026-05-30 · unverdicted · novelty 7.0

RWGBench is a citation-centric benchmark for related work generation built from 40k CS papers and a 100-paper test set, with multi-dimensional metrics that better match human expert judgment than standard similarity scores.

MemGym: a Long-Horizon Memory Environment for LLM Agents

cs.CL · 2026-05-20 · unverdicted · novelty 7.0

MemGym unifies agent gyms into a memory benchmark with isolated scoring across tool-use, research, coding, and computer-use regimes plus a lightweight reward model for tractable coding evaluation.

citing papers explorer

Showing 50 of 68 citing papers after filters.