hub

Corrective Retrieval Augmented Generation

Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling · 2024 · cs.CL · arXiv 2401.15884

27 Pith papers cite this work. Polarity classification is still indexing.

27 Pith papers citing it

open full Pith review browse 27 citing papers arXiv PDF

abstract

Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

cs.CL · 2026-05-14 · unverdicted · novelty 7.0

Context-Driven Decomposition (CDD) measures context compliance in RAG under knowledge conflicts and improves accuracy on adversarial benchmarks like TruthfulQA misconception injection and Epi-Scale tests across models.

Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection

cs.CL · 2026-05-11 · unverdicted · novelty 7.0 · 2 refs

Pre-Route elicits LLMs' latent routing skills via structured prompts on metadata to proactively choose RAG or long-context, outperforming reactive baselines on cost-effectiveness.

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

cs.AI · 2026-05-07 · accept · novelty 7.0

Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.

SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.

HaS: Accelerating RAG through Homology-Aware Speculative Retrieval

cs.IR · 2026-04-22 · unverdicted · novelty 7.0

HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.

ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.

Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing

cs.CL · 2026-04-17 · unverdicted · novelty 7.0

Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.

Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.

REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction

cs.MA · 2026-04-13 · unverdicted · novelty 7.0

RegReAct deploys self-correcting multi-agent pipelines across seven stages to extract hierarchical compliance criteria from regulatory texts, outperforming single-pass GPT-4o on EU Taxonomy documents.

Retrieval Augmented Conversational Recommendation with Reinforcement Learning

cs.IR · 2026-04-06 · unverdicted · novelty 7.0

RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

cs.AI · 2026-05-10 · unverdicted · novelty 6.0 · 2 refs

PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.

Agentic Retrieval-Augmented Generation for Financial Document Question Answering

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.

CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.

EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower latency than MIRIX.

Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models

cs.CL · 2026-04-28 · accept · novelty 6.0

Faithfulness-QA is a 99k-sample dataset created via counterfactual entity substitution on existing QA benchmarks to train and evaluate context-faithful RAG models.

AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

cs.AI · 2026-05-02 · unverdicted · novelty 5.0 · 2 refs

SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.

FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification

cs.AI · 2026-04-26 · unverdicted · novelty 5.0

FinGround reduces financial hallucinations by 68% over baselines in retrieval-equalized tests through atomic claim verification and grounding, with an 8B model retaining 91.4% F1 at low cost.

A Control Architecture for Training-Free Memory Use

cs.AI · 2026-04-20 · unverdicted · novelty 5.0

A training-free control architecture with uncertainty-based routing, confidence-selective acceptance, and evidence-based memory governance improves arithmetic reasoning by +7 points on SVAMP and ASDiv benchmarks.

Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents

cs.IR · 2026-04-14 · conditional · novelty 5.0

Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.

Token Economics for LLM Agents: A Dual-View Study from Computing and Economics

cs.AI · 2026-05-09 · unverdicted · novelty 4.0

The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.

citing papers explorer

Showing 27 of 27 citing papers.

Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict cs.CL · 2026-05-14 · unverdicted · none · ref 17 · internal anchor
Context-Driven Decomposition (CDD) measures context compliance in RAG under knowledge conflicts and improves accuracy on adversarial benchmarks like TruthfulQA misconception injection and Epi-Scale tests across models.
Route Before Retrieve: Activating Latent Routing Abilities of LLMs for RAG vs. Long-Context Selection cs.CL · 2026-05-11 · unverdicted · none · ref 3 · 2 links · internal anchor
Pre-Route elicits LLMs' latent routing skills via structured prompts on metadata to proactively choose RAG or long-context, outperforming reactive baselines on cost-effectiveness.
The Context Gathering Decision Process: A POMDP Framework for Agentic Search cs.AI · 2026-05-07 · accept · none · ref 40 · internal anchor
Framing LLM agent loops as a Context Gathering Decision Process POMDP yields a predicate-based belief state that boosts multi-hop reasoning up to 11.4% and an exhaustion gate that cuts token use up to 39% with no performance loss.
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States cs.CL · 2026-05-06 · unverdicted · none · ref 79 · internal anchor
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents cs.MA · 2026-05-05 · unverdicted · none · ref 45 · internal anchor
MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation cs.CL · 2026-05-04 · unverdicted · none · ref 28 · internal anchor
AdaGATE improves evidence F1 scores on HotpotQA for multi-hop RAG under clean, redundant, and noisy conditions by framing selection as gap-aware token-constrained repair, outperforming baselines while using 2.6x fewer tokens.
HaS: Accelerating RAG through Homology-Aware Speculative Retrieval cs.IR · 2026-04-22 · unverdicted · none · ref 24 · internal anchor
HaS accelerates RAG retrieval via homology-aware speculative retrieval and homologous query re-identification validation, cutting latency 24-37% with 1-2% accuracy drop on tested datasets.
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 42 · internal anchor
ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing cs.CL · 2026-04-17 · unverdicted · none · ref 26 · internal anchor
Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning cs.AI · 2026-04-16 · unverdicted · none · ref 38 · internal anchor
IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.
Credo: Declarative Control of LLM Pipelines via Beliefs and Policies cs.AI · 2026-04-15 · unverdicted · none · ref 11 · internal anchor
Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.
REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction cs.MA · 2026-04-13 · unverdicted · none · ref 9 · internal anchor
RegReAct deploys self-correcting multi-agent pipelines across seven stages to extract hierarchical compliance criteria from regulatory texts, outperforming single-pass GPT-4o on EU Taxonomy documents.
Retrieval Augmented Conversational Recommendation with Reinforcement Learning cs.IR · 2026-04-06 · unverdicted · none · ref 63 · internal anchor
RAR retrieves candidate items from a 300k-movie corpus then uses LLM generation with RL feedback to produce context-aware recommendations that outperform baselines on benchmarks.
PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning cs.AI · 2026-05-10 · unverdicted · none · ref 45 · 2 links · internal anchor
PiCA uses pivot-based potential rewards derived from historical sub-queries to supply trajectory-aware step guidance in agentic RL, delivering 15% gains on QA benchmarks for 3B/7B models.
Agentic Retrieval-Augmented Generation for Financial Document Question Answering cs.AI · 2026-05-06 · unverdicted · none · ref 41 · internal anchor
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation cs.CL · 2026-05-06 · unverdicted · none · ref 18 · internal anchor
CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 improvements.
EviMem: Evidence-Gap-Driven Iterative Retrieval for Long-Term Conversational Memory cs.CV · 2026-04-30 · unverdicted · none · ref 4 · internal anchor
EviMem improves accuracy on temporal and multi-hop questions in long-term conversational memory by iteratively diagnosing and filling evidence gaps, achieving 81.6% and 85.2% judge accuracy on LoCoMo at 4.5x lower latency than MIRIX.
Faithfulness-QA: A Counterfactual Entity Substitution Dataset for Training Context-Faithful RAG Models cs.CL · 2026-04-28 · accept · none · ref 14 · internal anchor
Faithfulness-QA is a 99k-sample dataset created via counterfactual entity substitution on existing QA benchmarks to train and evaluate context-faithful RAG models.
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases cs.AI · 2026-05-07 · unverdicted · none · ref 31 · internal anchor
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization cs.AI · 2026-05-02 · unverdicted · none · ref 20 · 2 links · internal anchor
SCM-GRPO grounds multi-hop fact verification in structural causal models and applies GRPO reinforcement learning to optimize reasoning chain length, outperforming baselines on HoVer and EX-FEVER.
FinGround: Detecting and Grounding Financial Hallucinations via Atomic Claim Verification cs.AI · 2026-04-26 · unverdicted · none · ref 7 · internal anchor
FinGround reduces financial hallucinations by 68% over baselines in retrieval-equalized tests through atomic claim verification and grounding, with an 8B model retaining 91.4% F1 at low cost.
A Control Architecture for Training-Free Memory Use cs.AI · 2026-04-20 · unverdicted · none · ref 26 · internal anchor
A training-free control architecture with uncertainty-based routing, confidence-selective acceptance, and evidence-based memory governance improves arithmetic reasoning by +7 points on SVAMP and ASDiv benchmarks.
Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents cs.IR · 2026-04-14 · conditional · none · ref 12 · internal anchor
Tree reasoning outperforms vector search on complex document queries but a hybrid approach balances results across tiers, with validation showing an 11.7-point gap on real finance documents.
Token Economics for LLM Agents: A Dual-View Study from Computing and Economics cs.AI · 2026-05-09 · unverdicted · none · ref 77 · internal anchor
The paper delivers a unified survey of token economics for LLM agents, conceptualizing tokens as production factors, exchange mediums, and units of account across micro, meso, macro, and security dimensions using established economic theories.
An Agent-Oriented Pluggable Experience-RAG Skill for Experience-Driven Retrieval Strategy Orchestration cs.AI · 2026-05-05 · unverdicted · none · ref 8 · 2 links · internal anchor
Experience-RAG Skill is a reusable agent skill that selects retrieval strategies via experience memory, achieving 0.8924 nDCG@10 on BeIR/nq, hotpotqa, and scifact while outperforming fixed retriever baselines.
Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU cs.AI · 2026-04-27 · unverdicted · none · ref 5 · internal anchor
Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.
STEM: Structure-Tracing Evidence Mining for Knowledge Graphs-Driven Retrieval-Augmented Generation cs.CL · 2026-04-24 · unverdicted · none · ref 2 · internal anchor
STEM is a new framework for multi-hop KGQA that projects queries to adaptive schema graphs and uses Triple-GNN guidance to retrieve more accurate and complete evidence subgraphs, claiming SOTA results.

Corrective Retrieval Augmented Generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer