super hub

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Alex Chao, Apurva Mody, Darren Edge, Ha Trinh, Joshua Bradley, Newman Cheng · 2024 · cs.CL · arXiv 2404.16130

102 Pith papers cite this work. Polarity classification is still indexing.

102 Pith papers citing it

open full Pith review browse 102 citing papers more from Alex Chao arXiv PDF

abstract

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose GraphRAG, a graph-based approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text. Our approach uses an LLM to build a graph index in two stages: first, to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that GraphRAG leads to substantial improvements over a conventional RAG baseline for both the comprehensiveness and diversity of generated answers.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

claims ledger

abstract The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, do not scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these

authors

Alex Chao Apurva Mody Darren Edge Ha Trinh Joshua Bradley Newman Cheng

co-cited works

representative citing papers

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

cs.AI · 2026-05-12 · conditional · novelty 8.0

MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0

ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.

Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration

cs.CR · 2026-05-03 · unverdicted · novelty 8.0

Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying utility costs.

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

PyRAG turns multi-hop reasoning into executable Python code over retrieval tools for explicit, verifiable step-by-step RAG.

MEME: Multi-entity & Evolving Memory Evaluation

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.

Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.

MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.

DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning

cs.CL · 2026-05-11 · unverdicted · novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

cs.AI · 2026-05-11 · unverdicted · novelty 7.0

MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.

SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards

eess.SP · 2026-05-09 · unverdicted · novelty 7.0

SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compared to standard GraphRAG.

When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

cs.AI · 2026-05-07 · accept · novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory

cs.CL · 2026-05-01 · unverdicted · novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.

XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation

cs.AI · 2026-04-27 · unverdicted · novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.

Skill Retrieval Augmentation for Agentic AI

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.

A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding

cs.AI · 2026-04-21 · unverdicted · novelty 7.0

A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.

Structure Guided Retrieval-Augmented Generation for Factual Queries

cs.IR · 2026-04-21 · unverdicted · novelty 7.0

SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.

ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.

STRIDE: Strategic Iterative Decision-Making for Retrieval-Augmented Multi-Hop Question Answering

cs.AI · 2026-04-19 · unverdicted · novelty 7.0

STRIDE uses a meta-planner for entity-agnostic reasoning skeletons and a supervisor for dependency-aware execution to improve retrieval-augmented multi-hop QA.

SAGER: Self-Evolving User Policy Skills for Recommendation Agent

cs.IR · 2026-04-16 · unverdicted · novelty 7.0

SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to memory accumulation.

ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback

cs.AI · 2026-04-08 · unverdicted · novelty 7.0

ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.

citing papers explorer

Showing 50 of 102 citing papers.

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare cs.AI · 2026-05-12 · conditional · none · ref 7 · internal anchor
MedMemoryBench supplies a 2,000-session synthetic medical trajectory dataset and an evaluate-while-constructing streaming protocol to expose memory saturation and reasoning failures in current agent architectures for personalized healthcare.
ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts cs.CR · 2026-05-09 · unverdicted · none · ref 13 · internal anchor
ShadowMerge poisons graph-based agent memory via relation-channel conflicts using an AIR pipeline, achieving 93.8% average attack success rate on Mem0 and three real-world datasets while bypassing existing defenses.
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration cs.CR · 2026-05-03 · unverdicted · none · ref 20 · internal anchor
Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying utility costs.
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation cs.AI · 2026-05-13 · unverdicted · none · ref 5 · internal anchor
PyRAG turns multi-hop reasoning into executable Python code over retrieval tools for explicit, verifiable step-by-step RAG.
MEME: Multi-entity & Evolving Memory Evaluation cs.LG · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
All tested LLM memory systems fail at dependency reasoning in multi-entity evolving scenarios, with only an expensive file-based setup showing partial recovery.
Goal-Oriented Reasoning for RAG-based Memory in Conversational Agentic LLM Systems cs.AI · 2026-05-12 · unverdicted · none · ref 7 · internal anchor
Goal-Mem improves RAG memory retrieval in agentic LLMs by explicit goal decomposition and backward chaining via Natural Language Logic, outperforming nine baselines on multi-hop and implicit inference tasks.
MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image cs.LG · 2026-05-11 · unverdicted · none · ref 21 · internal anchor
MulTaBench is a new collection of 40 image-tabular and text-tabular datasets designed to test target-aware representation tuning in multimodal tabular models.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning cs.CL · 2026-05-11 · unverdicted · none · ref 5 · internal anchor
DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs cs.AI · 2026-05-11 · unverdicted · none · ref 8 · internal anchor
MAGE uses a four-subgraph co-evolutionary knowledge graph plus dual bandits to externalize and retrieve experience for stable self-evolution of frozen language-model agents, showing gains on nine diverse benchmarks.
SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards eess.SP · 2026-05-09 · unverdicted · none · ref 9 · internal anchor
SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compared to standard GraphRAG.
When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory cs.AI · 2026-05-08 · unverdicted · none · ref 17 · internal anchor
A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.
The Context Gathering Decision Process: A POMDP Framework for Agentic Search cs.AI · 2026-05-07 · accept · none · ref 5 · internal anchor
Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents cs.CL · 2026-05-07 · unverdicted · none · ref 8 · internal anchor
MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 56 · internal anchor
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents cs.MA · 2026-05-05 · unverdicted · none · ref 11 · internal anchor
MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory cs.CL · 2026-05-01 · unverdicted · none · ref 39 · internal anchor
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation cs.AI · 2026-04-27 · unverdicted · none · ref 7 · internal anchor
XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
Skill Retrieval Augmentation for Agentic AI cs.CL · 2026-04-27 · unverdicted · none · ref 6 · internal anchor
Agents improve when they retrieve skills on demand from large corpora, yet current models cannot selectively decide when to load or ignore a retrieved skill.
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding cs.AI · 2026-04-21 · unverdicted · none · ref 17 · internal anchor
A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.
Structure Guided Retrieval-Augmented Generation for Factual Queries cs.IR · 2026-04-21 · unverdicted · none · ref 15 · internal anchor
SG-RAG frames retrieval as subgraph matching to ensure LLMs meet every condition in factual queries and reports large gains over baselines on a new 120k-pair ERQA dataset.
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 4 · internal anchor
ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.
STRIDE: Strategic Iterative Decision-Making for Retrieval-Augmented Multi-Hop Question Answering cs.AI · 2026-04-19 · unverdicted · none · ref 7 · internal anchor
STRIDE uses a meta-planner for entity-agnostic reasoning skeletons and a supervisor for dependency-aware execution to improve retrieval-augmented multi-hop QA.
SAGER: Self-Evolving User Policy Skills for Recommendation Agent cs.IR · 2026-04-16 · unverdicted · none · ref 2 · internal anchor
SAGER equips LLM recommendation agents with per-user evolving policy skills via two-representation architecture, contrastive CoT diagnosis, and skill-augmented listwise reasoning, yielding SOTA gains orthogonal to memory accumulation.
ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback cs.AI · 2026-04-08 · unverdicted · none · ref 6 · internal anchor
ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.
MisEdu-RAG: A Misconception-Aware Dual-Hypergraph RAG for Novice Math Teachers cs.IR · 2026-04-05 · unverdicted · none · ref 7 · internal anchor
MisEdu-RAG builds concept and instance hypergraphs for two-stage retrieval of pedagogical knowledge and student errors, improving feedback quality on the MisstepMath benchmark by 10.95% token-F1 and up to 15.3% on response dimensions.
AnnoRetrieve: Efficient Structured Retrieval for Unstructured Document Analysis cs.IR · 2026-04-03 · unverdicted · none · ref 3 · internal anchor
AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.
Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems cs.IR · 2026-04-01 · unverdicted · none · ref 3 · internal anchor
Agentic search narrows the gap between dense RAG and GraphRAG but does not remove GraphRAG's advantage on complex multi-hop reasoning.
Cognifold: Always-On Proactive Memory via Cognitive Folding cs.AI · 2026-05-13 · unverdicted · none · ref 11 · internal anchor
Cognifold is a new proactive memory architecture that folds event streams into emergent cognitive structures by extending complementary learning systems theory with a prefrontal intent layer and graph topology self-organization.
IdeaForge: A Knowledge Graph-Grounded Multi-Agent Framework for Cross-Methodology Innovation Analysis and Patent Claim Generation cs.AI · 2026-05-13 · unverdicted · none · ref 12 · internal anchor
IdeaForge combines multiple innovation methodologies through specialist agents on a persistent knowledge graph, using cross-methodology convergent claim linkages to rank and draft patent claims with higher traceability than single-method baselines.
PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents cs.CL · 2026-05-12 · unverdicted · none · ref 4 · internal anchor
PRISM achieves higher accuracy than baselines on long-horizon agent tasks at an order-of-magnitude smaller context budget by combining hierarchical bundle search, query-sensitive costing, evidence compression, and adaptive intent routing over structured memory.
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory cs.AI · 2026-05-12 · unverdicted · none · ref 206 · internal anchor
SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and long-term agent benchmarks.
SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs cs.CL · 2026-05-12 · unverdicted · none · ref 3 · internal anchor
SkillGraph represents skills as nodes in an evolving directed graph with typed dependency edges and updates the graph from RL trajectories to boost compositional task performance.
Leveraging RAG for Training-Free Alignment of LLMs cs.LG · 2026-05-11 · unverdicted · none · ref 20 · internal anchor
RAG-Pref is a training-free RAG-based alignment technique that conditions LLMs on contrastive preference samples during inference, yielding over 3.7x average improvement in agentic attack refusals when combined with offline methods across five LLMs.
ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV cs.CL · 2026-05-11 · conditional · none · ref 7 · internal anchor
Intent-aware retrieval over assertion-labeled knowledge graphs improves clinical QA accuracy by 22 percentage points on a new MIMIC-IV benchmark that stresses negation, temporality, and attribution.
ASTRA-QA: A Benchmark for Abstract Question Answering over Documents cs.CL · 2026-05-11 · unverdicted · none · ref 2 · internal anchor
ASTRA-QA is a benchmark for abstract document question answering that uses explicit topic sets, unsupported content annotations, and evidence alignments to enable direct scoring of coverage and hallucination.
SkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution cs.CL · 2026-05-11 · unverdicted · none · ref 35 · internal anchor
SkillRAE organizes skills into a graph and compiles compact, grounded contexts for LLM agents, yielding 11.7% gains on SkillsBench over prior RAE methods.
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution cs.AI · 2026-05-11 · unverdicted · none · ref 13 · internal anchor
HAGE proposes a trainable weighted graph memory framework with LLM intent classification, dynamic edge modulation, and RL optimization that improves long-horizon reasoning accuracy in agentic LLMs over static baselines.
Generating Leakage-Free Benchmarks for Robust RAG Evaluation cs.CL · 2026-05-09 · unverdicted · none · ref 5 · internal anchor
SeedRG generates novel, leakage-free RAG benchmark examples from seed data by mapping reasoning structures and swapping entities while applying consistency and leakage checks.
LARAG: Link-Aware Retrieval Strategy for RAG Systems in Hyperlinked Technical Documentation cs.IR · 2026-05-08 · unverdicted · none · ref 8 · internal anchor
LARAG improves RAG answer quality on hyperlinked technical documentation by using author-defined links for retrieval, achieving higher BERTScore while using fewer chunks and tokens than standard embedding-based RAG.
Topic Is Not Agenda: A Citation-Community Audit of Text Embeddings cs.IR · 2026-05-08 · unverdicted · none · ref 19 · internal anchor
Embeddings retrieve same-subfield papers at 45-52% but same-agenda papers at only 15-21%; citation rerank reaches 57-59% on agenda queries.
Query-efficient model evaluation using cached responses cs.LG · 2026-05-08 · unverdicted · none · ref 125 · internal anchor
DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems cs.CL · 2026-05-08 · conditional · none · ref 8 · internal anchor
WiCER iteratively diagnoses and repairs fact loss during wiki compilation for LLMs, recovering 80% of quality lost in blind distillation across 17 domains while cutting catastrophic failures by 55%.
Group of Skills: Group-Structured Skill Retrieval for Agent Skill Libraries cs.CL · 2026-05-07 · unverdicted · none · ref 9 · internal anchor
GoSkills converts flat skill lists into role-labeled execution contexts via anchor-centered groups and graph expansion, preserving coverage and improving rewards on SkillsBench and ALFWorld under small skill budgets.
ScrapMem: A Bio-inspired Framework for On-device Personalized Agent Memory via Optical Forgetting cs.AI · 2026-05-05 · unverdicted · none · ref 58 · internal anchor
ScrapMem introduces optical forgetting to compress multimodal memories for LLM agents on edge devices, cutting storage by up to 93% while reaching 51.0% Joint@10 and 70.3% Recall@10 on ATM-Bench.
CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification cs.CL · 2026-05-05 · unverdicted · none · ref 16 · internal anchor
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
Retrieval and Multi-Hop Reasoning in 1M-Token Context Windows: Evaluating LLMs on Classical Chinese Text cs.AI · 2026-05-04 · unverdicted · none · ref 2 · internal anchor
Frontier LLMs solve single-needle retrieval at 1M tokens on classical Chinese but show three distinct accuracy-decay patterns in three-hop reasoning between 256K and 1M tokens.
Enhancing Judgment Document Generation via Agentic Legal Information Collection and Rubric-Guided Optimization cs.CL · 2026-05-03 · unverdicted · none · ref 3 · internal anchor
Judge-R1 improves LLM judgment document generation by combining agentic legal information retrieval with GRPO-based rubric-guided optimization, outperforming baselines on the JuDGE benchmark.
From Unstructured Recall to Schema-Grounded Memory: Reliable AI Memory via Iterative, Schema-Aware Extraction cs.AI · 2026-04-30 · unverdicted · none · ref 20 · internal anchor
Schema-aware iterative extraction turns AI memory into a verified system of record, reaching 90-97% accuracy on extraction and end-to-end memory benchmarks where retrieval baselines score 80-87%.
ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era cs.AI · 2026-04-30 · unverdicted · none · ref 2 · internal anchor
ObjectGraph is a Markdown superset file format that represents documents as traversable knowledge graphs, achieving up to 95.3% token reduction for agents with no significant accuracy loss.
Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations cs.AI · 2026-04-27 · unverdicted · none · ref 48 · internal anchor
Grounding LLMs via node-wise anchors in a traffic scenario taxonomy improves law-scenario matching by 29.1% and derived requirement accuracy by 36.9-38.2% on Chinese laws and 5,897 scenarios, enabling a compliance layer and real-time monitor for AVs.

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer