Retrieval-Augmented Generation for Large Language Models: A Survey

Yunfan Gao , Yun Xiong , Xinyu Gao , Kangxiang Jia , Jinliu Pan , Yuxi Bi , Yi Dai , Jiawei Sun

show 2 more authors

Meng Wang Haofen Wang

Authors on Pith no claims yet

classification 💻 cs.CL cs.AI

keywords generationknowledgechallengesdatabasesexternallanguagelargellms

0 comments

read the original abstract

Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the generation, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval, the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces up-to-date evaluation framework and benchmark. At the end, this article delineates the challenges currently faced and points out prospective avenues for research and development.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Knowledge Poisoning Attacks on Medical Multi-Modal Retrieval-Augmented Generation
cs.CR 2026-05 unverdicted novelty 8.0

M³Att poisons medical multimodal RAG by pairing covert textual misinformation with query-agnostic visual perturbations that increase retrieval of the bad content, causing LLMs to generate clinically plausible but inco...
Trojan Hippo: Weaponizing Agent Memory for Data Exfiltration
cs.CR 2026-05 unverdicted novelty 8.0

Trojan Hippo attacks on LLM agent memory achieve 85-100% success rates in data exfiltration across four memory backends even after 100 benign sessions, while evaluated defenses reduce success rates but impose varying ...
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing
cs.CR 2026-04 unverdicted novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.
Utility-Oriented Visual Evidence Selection for Multimodal Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 7.0

Evidence utility is defined as information gain on the model's output distribution, with ranking by gain on a latent helpfulness variable shown equivalent to answer-space utility under mild assumptions, enabling a tra...
A Hybrid Framework for Natural Language Querying of IFC Models with Relational and Graph Representations
cs.CL 2026-05 unverdicted novelty 7.0

IfcLLM combines relational and graph representations of IFC models with iterative LLM reasoning to deliver 93.3-100% first-attempt accuracy on natural language queries across three test models.
Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation
cs.AI 2026-05 unverdicted novelty 7.0

PyRAG turns multi-hop reasoning into executable Python code over retrieval tools for explicit, verifiable step-by-step RAG.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning
cs.CL 2026-05 unverdicted novelty 7.0

DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.
EquiMem: Calibrating Shared Memory in Multi-Agent Debate via Game-Theoretic Equilibrium
cs.AI 2026-05 unverdicted novelty 7.0

EquiMem calibrates shared memory in multi-agent debate by computing a game-theoretic equilibrium from agent queries and paths, outperforming heuristics and LLM validators across benchmarks while remaining robust to ad...
SEM-RAG: Structure-Preserving Multimodal Graph Compilation and Entropy-Guided Retrieval for Telecommunication Standards
eess.SP 2026-05 unverdicted novelty 7.0

SEM-RAG compiles telecommunication standards into structure-preserving graphs and uses entropy-guided retrieval to reach 94.1% accuracy on TeleQnA and 93.8% on ORAN-Bench-13K while reducing indexing token usage compar...
MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
cs.CL 2026-05 unverdicted novelty 7.0

MANTRA automatically synthesizes SMT-validated compliance benchmarks for LLM agents from natural language manuals and tool schemas, producing 285 tasks across 6 domains with minimal human effort.
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
cs.CL 2026-05 unverdicted novelty 7.0

LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
Stateful Agent Backdoor
cs.CR 2026-05 unverdicted novelty 7.0

A stateful backdoor for LLM agents, modeled as a Mealy machine with a decomposition framework, enables incremental malicious actions across sessions and achieves 80-95% attack success rate on four models.
Privacy Without Losing Place: A Paradigm for Private Retrieval in Spatial RAGs
cs.CR 2026-05 unverdicted novelty 7.0

PAS encodes locations via relative anchors and bins to deliver roughly 370-400m adversarial error in spatial RAG while retaining over half the baseline retrieval performance and keeping generation quality robust.
Telegraph English: Semantic Prompt Compression via Structured Symbolic Rewriting
cs.CL 2026-05 unverdicted novelty 7.0

Telegraph English compresses prompts via structured symbolic rewriting into atomic facts, achieving roughly 50% token reduction with 99.1% key-fact accuracy on LongBench-v2 and outperforming token-deletion baselines a...
E-MIA: Exam-Style Black-Box Membership Inference Attacks against RAG Systems
cs.CR 2026-05 unverdicted novelty 7.0

E-MIA converts document details into four types of exam questions and aggregates the RAG's answers into a membership score that separates member and non-member documents better than prior similarity-based or probe-bas...
ReLay: Personalized LLM-Generated Plain-Language Summaries for Better Understanding, but at What Cost?
cs.CL 2026-05 unverdicted novelty 7.0

Personalized LLM-generated plain language summaries improve lay readers' comprehension and quality ratings but increase risks of reinforcing biases and introducing hallucinations compared to static expert summaries.
TADI: Tool-Augmented Drilling Intelligence via Agentic LLM Orchestration over Heterogeneous Wellsite Data
cs.AI 2026-04 unverdicted novelty 7.0

TADI shows that domain-specialized tools orchestrated by an LLM over dual structured and semantic databases can convert heterogeneous wellsite data into evidence-grounded drilling intelligence, with tool design matter...
When to Retrieve During Reasoning: Adaptive Retrieval for Large Reasoning Models
cs.IR 2026-04 unverdicted novelty 7.0

ReaLM-Retrieve uses step-level uncertainty to trigger retrievals during reasoning, achieving 10.1% better F1 scores and 47% fewer calls on multi-hop QA benchmarks.
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates
cs.SE 2026-04 unverdicted novelty 7.0

RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than p...
Context-Augmented Code Generation: How Product Context Improves AI Coding Agent Decision Compliance by 49%
cs.SE 2026-04 unverdicted novelty 7.0

Adding product context retrieval to AI coding agents raises decision compliance from 46% to 95% on a new benchmark of 8 tasks with 41 weighted decision points.
XGRAG: A Graph-Native Framework for Explaining KG-based Retrieval-Augmented Generation
cs.AI 2026-04 unverdicted novelty 7.0

XGRAG uses graph perturbations to quantify component contributions in GraphRAG and achieves 14.81% better explanation quality than text-based baselines on QA datasets, with correlations to graph centrality.
Similar Users-Augmented Interest Network
cs.IR 2026-04 unverdicted novelty 7.0

SUIN improves CTR prediction by augmenting target user sequences with similar users' behaviors via embedding-based retrieval, user-specific position encoding, and user-aware target attention.
Uncertainty Propagation in LLM-Based Systems
cs.SE 2026-04 unverdicted novelty 7.0

This paper introduces a systems-level conceptual framing and a three-level taxonomy (intra-model, system-level, socio-technical) for uncertainty propagation in compound LLM applications, along with engineering insight...
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
cs.CR 2026-04 unverdicted novelty 7.0

A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
Participatory provenance as representational auditing for AI-mediated public consultation
cs.AI 2026-04 unverdicted novelty 7.0

Participatory provenance auditing of Canada's AI strategy consultation shows official AI summaries exclude 15-17% of participants more than random baselines, with 33-88% exclusion for dissent clusters.
Learning When Not to Decide: A Framework for Overcoming Factual Presumptuousness in AI Adjudication
cs.AI 2026-04 unverdicted novelty 7.0

A new structured prompting method (SPEC) helps AI detect insufficient evidence in adjudication tasks and defer decisions appropriately, reaching 89% accuracy on a benchmark varying information completeness from Colora...
ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation
cs.CL 2026-04 unverdicted novelty 7.0

ArbGraph resolves conflicts in RAG evidence by constructing a conflict-aware graph of atomic claims and applying intensity-driven iterative arbitration to suppress unreliable claims prior to generation.
STRIDE: Strategic Iterative Decision-Making for Retrieval-Augmented Multi-Hop Question Answering
cs.AI 2026-04 unverdicted novelty 7.0

STRIDE uses a meta-planner for entity-agnostic reasoning skeletons and a supervisor for dependency-aware execution to improve retrieval-augmented multi-hop QA.
Skill-RAG: Failure-State-Aware Retrieval Augmentation via Hidden-State Probing and Skill Routing
cs.CL 2026-04 unverdicted novelty 7.0

Skill-RAG detects retrieval failure states from hidden representations and routes to one of four corrective skills to raise accuracy on persistent hard cases in open-domain QA and reasoning benchmarks.
ASTRA: Enhancing Multi-Subject Generation with Retrieval-Augmented Pose Guidance and Disentangled Position Embedding
cs.CV 2026-04 unverdicted novelty 7.0

ASTRA disentangles subject identity from pose structure in diffusion transformers via retrieval-augmented pose guidance, asymmetric EURoPE embeddings, and a DSM adapter to improve multi-subject generation.
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
cs.CL 2026-04 unverdicted novelty 7.0

MM-Doc-R1 combines an agentic workflow with Similarity-based Policy Optimization (SPO) to achieve 10.4% higher performance than prior baselines on long-document visual question answering.
Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size
cs.CL 2026-04 unverdicted novelty 7.0

Contextual entrainment decreases for semantic contexts but increases for non-semantic ones as LLMs scale, following power-law trends with 4x better resistance to misinformation but 2x more copying of arbitrary tokens.
Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method
cs.CL 2026-04 unverdicted novelty 7.0

ConflictQA benchmark shows LLMs fail to resolve conflicts between text and KG evidence and often default to one source, motivating the XoT explanation-based reasoning method.
VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
cs.CV 2026-04 unverdicted novelty 7.0

VISOR is a unified agentic VRAG framework with Evidence Space structuring, visual action evaluation/correction, and dynamic sliding-window trajectories trained via GRPO-based RL that achieves SOTA performance on long-...
Decoupling Vector Data and Index Storage for Space Efficiency
cs.DB 2026-04 unverdicted novelty 7.0

DecoupleVS decouples vector data and index storage in ANNS systems to cut storage space by up to 58.7% with competitive search and update performance.
ROZA Graphs: Self-Improving Near-Deterministic RAG through Evidence-Centric Feedback
cs.AI 2026-04 unverdicted novelty 7.0

ROZA graphs enable self-improving RAG by storing evidence-specific reasoning chains, yielding up to 10.6pp accuracy gains and 46% lower cost through graph traversal feedback.
An End-to-End Approach for Fixing Concurrency Bugs via SHB-Based Context Extractor
cs.SE 2026-04 unverdicted novelty 7.0

ConFixAgent repairs diverse concurrency bugs end-to-end by using Static Happens-Before graphs to extract relevant code context for LLMs, outperforming prior tools in benchmarks.
Can You Trust the Vectors in Your Vector Database? Black-Hole Attack from Embedding Space Defects
cs.CR 2026-04 unverdicted novelty 7.0

Injecting a few malicious vectors near the centroid exploits centrality-driven hubness in high-dimensional embeddings, causing them to dominate top-k retrievals in up to 99.85% of cases.
Architecture Without Architects: How AI Coding Agents Shape Software Architecture
cs.SE 2026-04 unverdicted novelty 7.0

AI coding agents perform vibe architecting by making prompt-driven architectural choices that produce structurally different systems for identical tasks.
Unified and Efficient Approach for Multi-Vector Similarity Search
cs.DB 2026-04 unverdicted novelty 7.0

MV-HNSW is the first native hierarchical graph index for multi-vector data, achieving over 90% recall with up to 14x lower search latency than prior filter-and-refine approaches across seven datasets.
AnnoRetrieve: Efficient Structured Retrieval for Unstructured Document Analysis
cs.IR 2026-04 unverdicted novelty 7.0

AnnoRetrieve uses auto-generated structured schemas and queries to retrieve information from unstructured documents more efficiently and accurately than embedding-based methods.
From PDF to RAG-Ready: Evaluating Document Conversion Frameworks for Domain-Specific Question Answering
cs.IR 2026-03 unverdicted novelty 7.0

Docling with hierarchical splitting reaches 94.1% RAG accuracy on domain documents, beating naive PDF loading but trailing manual Markdown curation at 97.1%.
Why Retrieval-Augmented Generation Fails: A Graph Perspective
cs.CL 2026-05 unverdicted novelty 6.0

Attribution graphs reveal that RAG failures arise from shallow fragmented evidence flow in LLMs, enabling topology-based detection and targeted interventions that reinforce question-guided routing.
Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 6.0

Derivation Prompting constructs logic-based derivation trees in RAG generation to improve interpretability and reduce unacceptable answers compared to standard RAG or long-context methods in a case study.
SieveFL: Hierarchical Runtime-Aware Pruning for Scalable LLM-Based Fault Localization
cs.SE 2026-05 conditional novelty 6.0

SieveFL combines vector retrieval and JaCoCo runtime pruning to cut LLM token use by 49% while achieving 41.8% Top-1 accuracy on 395 Defects4J bugs, outperforming AgentFL.
PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents
cs.CL 2026-05 unverdicted novelty 6.0

PAI-2 improves factual correctness in LLM answers by 4% on average across benchmarks using adaptive graph traversal and planning, with 6% gains from traversal algorithms and 18% from enabled planning.
Task-Adaptive Embedding Refinement via Test-time LLM Guidance
cs.CL 2026-05 unverdicted novelty 6.0

Test-time LLM feedback refines query embeddings to deliver up to 25% relative gains on zero-shot literature search, intent detection, and related benchmarks.
Scalable Token-Level Hallucination Detection in Large Language Models
cs.CL 2026-05 unverdicted novelty 6.0

TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reaso...
SAGE: A Self-Evolving Agentic Graph-Memory Engine for Structure-Aware Associative Memory
cs.AI 2026-05 unverdicted novelty 6.0

SAGE is a self-evolving agentic graph-memory engine that dynamically constructs and refines structured memory graphs via writer-reader feedback, yielding performance gains on multi-hop QA, open-domain retrieval, and l...
LegalCiteBench: Evaluating Citation Reliability in Legal Language Models
cs.CL 2026-05 unverdicted novelty 6.0

LegalCiteBench reveals that current LLMs achieve under 7% accuracy on closed-book legal citation retrieval and completion tasks, with misleading answer rates above 94% for nearly all tested models.
Black-box model classification under the discriminative factorization
cs.LG 2026-05 unverdicted novelty 6.0

Discriminative factorization distinguishes high-quality query sets for black-box model classification, with chance-level error decaying exponentially in query budget and parameters predicting empirical decay rates on ...
Characterizing and Mitigating False-Positive Bug Reports in the Linux Kernel
cs.SE 2026-05 conditional novelty 6.0

False-positive bug reports in the Linux kernel consume effort comparable to real bugs and can be filtered by LLMs using retrieval-augmented generation at 88% F1.
Geometry-Aided Channel Deduction: A Robust Channel Acquisition Framework Utilizing Coarse Scenario Prompt
eess.SP 2026-05 unverdicted novelty 6.0

GCD extracts approximate geometric channel features from coarse scenario maps using ray tracing and neighborhood search, converts them to pseudo-channels via feature alignment, and fuses them with partial pilot estima...
WiCER: Wiki-memory Compile, Evaluate, Refine Iterative Knowledge Compilation for LLM Wiki Systems
cs.CL 2026-05 conditional novelty 6.0

WiCER iteratively diagnoses and repairs fact loss during wiki compilation for LLMs, recovering 80% of quality lost in blind distillation across 17 domains while cutting catastrophic failures by 55%.
Text-to-CAD Retrieval: a Strong Baseline
cs.CV 2026-05 unverdicted novelty 6.0

Text-to-CAD retrieval is introduced as a cross-modal task with a baseline that learns joint embeddings from CAD construction sequences, point clouds, and text queries via a masked feature decoder.
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
cs.AI 2026-05 unverdicted novelty 6.0

FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9...
CAR: Query-Guided Confidence-Aware Reranking for Retrieval-Augmented Generation
cs.CL 2026-05 unverdicted novelty 6.0

CAR reranks documents in RAG by promoting those that increase generator confidence (via answer consistency sampling) and demoting those that decrease it, yielding NDCG@5 gains on BEIR datasets that correlate with F1 i...
CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment
cs.AI 2026-05 unverdicted novelty 6.0

CASCADE enables LLMs to continually adapt at deployment via case-based episodic memory and contextual bandits, improving macro-averaged success by 20.9% over zero-shot on 16 tasks spanning medicine, law, code, and robotics.
RAG over Thinking Traces Can Improve Reasoning Tasks
cs.IR 2026-05 unverdicted novelty 6.0

RAG over structured thinking traces boosts LLM reasoning on AIME, LiveCodeBench, and GPQA, with relative gains up to 56% and little added cost.
Verbal-R3: Verbal Reranker as the Missing Bridge between Retrieval and Reasoning
cs.CL 2026-05 unverdicted novelty 6.0

Verbal-R3 uses a verbal reranker to generate analytic narratives that guide retrieval and reasoning in LLMs, achieving SOTA results on complex QA benchmarks.