CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.
hub Canonical reference
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Canonical reference. 90% of citing Pith papers cite this work as background.
abstract
Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic, real-time queries, resulting in outdated or inaccurate outputs. Retrieval-Augmented Generation (RAG) has emerged as a solution, enhancing LLMs by integrating real-time data retrieval to provide contextually relevant and up-to-date responses. Despite its promise, traditional RAG systems are constrained by static workflows and lack the adaptability required for multi-step reasoning and complex task management. Agentic Retrieval-Augmented Generation (Agentic RAG) transcends these limitations by embedding autonomous AI agents into the RAG pipeline. These agents leverage agentic design patterns reflection, planning, tool use, and multi-agent collaboration to dynamically manage retrieval strategies, iteratively refine contextual understanding, and adapt workflows through operational structures ranging from sequential steps to adaptive collaboration. This integration enables Agentic RAG systems to deliver flexibility, scalability, and context-awareness across diverse applications. This paper presents an analytical survey of Agentic RAG systems. It traces the evolution of RAG paradigms, introduces a principled taxonomy of Agentic RAG architectures based on agent cardinality, control structure, autonomy, and knowledge representation, and provides a comparative analysis of design trade-offs across existing frameworks. The survey examines applications in healthcare, finance, education, and enterprise document processing, and distills practical lessons for system designers and practitioners. Finally, it identifies key open research challenges related to evaluation, coordination, memory management, efficiency, and governance, outlining directions for future research.
hub tools
citation-role summary
citation-polarity summary
roles
background 10representative citing papers
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.
A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.
Corpus2Skill converts document corpora into navigable hierarchical skill directories for LLM agents, improving QA and RAG quality on single-domain enterprise data but not on open-domain or tabular corpora.
E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.
DotRAG reformulates graph retrieval as query-guided path reasoning with Division of Thought, reporting SOTA results on MetaQA and UltraDomain for multi-hop tasks.
Q-RAG trains embedders via RL for multi-step retrieval and reports state-of-the-art results on BabiLong and RULER benchmarks for contexts up to 10M tokens.
HiPRAG adds hierarchical process rewards to RL training for agentic RAG, reducing over-search to 2.3% and achieving 65.4-67.2% accuracy on seven QA benchmarks across 3B and 7B models.
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99% of the time.
Agentic GraphRAG constructs a Neo4j graph via deterministic structured ingestion plus LLM extraction from notices, then deploys modular agents with tool access and reflection to outperform vector-RAG baselines on Swiss commercial gazette data across entity resolution, answer quality, and multi-turn
ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
SpecHop accelerates multi-hop LLM tool use via continuous multi-threaded speculation with asynchronous verification, approaching oracle latency gains and reducing latency up to 40% on retrieval tasks.
GraphMind builds and evolves action-centric workflow graphs from operational traces using offline extraction, online multi-agent traversal with LLMs, and Adaptive Traversal Reinforcement to improve automation in cloud database incident handling.
AMATA is an adaptive multi-agent trajectory alignment system that improves factual consistency in knowledge-intensive QA via intra-trajectory preference learning and inter-agent dependency optimization.
PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
SiriusHelper deploys an LLM agent with intent routing, DeepSearch multi-hop retrieval, and automated SOP distillation to outperform alternatives and reduce ticket volume by 20.8% on Tencent's big data platform.
QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.
citing papers explorer
-
CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG
CuSearch reallocates rollout budget in RLVR toward deeper-search trajectories as a proxy for retrieval supervision density, yielding up to 11.8 exact-match gains over uniform GRPO sampling on ZeroSearch.
-
LatentRAG: Latent Reasoning and Retrieval for Efficient Agentic RAG
LatentRAG performs agentic RAG by generating latent tokens for thoughts and subqueries in one forward pass, matching explicit methods' accuracy on seven benchmarks while reducing latency by ~90%.
-
SCOUT: Active Information Foraging for Long-Text Understanding with Decoupled Epistemic States
SCOUT achieves state-of-the-art long-text understanding with up to 8x lower token use by actively foraging for sparse query-relevant information and updating a compact provenance-grounded epistemic state.
-
RAG-Reflect: Agentic Retrieval-Augmented Generation with Reflections for Comment-Driven Code Maintenance on Stack Overflow
RAG-Reflect achieves F1=0.78 on valid comment-edit prediction using retrieval-augmented reasoning and self-reflection, outperforming baselines and approaching fine-tuned models without retraining.
-
A-MAR: Agent-based Multimodal Art Retrieval for Fine-Grained Artwork Understanding
A-MAR decomposes art queries into reasoning plans to condition retrieval, leading to improved explanation quality and multi-step reasoning on art benchmarks compared to baselines.
-
Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Corpus2Skill converts document corpora into navigable hierarchical skill directories for LLM agents, improving QA and RAG quality on single-domain enterprise data but not on open-domain or tabular corpora.
-
E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning
E2E-REME outperforms nine LLMs in accuracy and efficiency for end-to-end microservice remediation by using experience-simulation reinforcement fine-tuning on a new benchmark called MicroRemed.
-
DOTRAG: Retrieval-Time Reasoning Along Paths
DotRAG reformulates graph retrieval as query-guided path reasoning with Division of Thought, reporting SOTA results on MetaQA and UltraDomain for multi-hop tasks.
-
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Q-RAG trains embedders via RL for multi-step retrieval and reports state-of-the-art results on BabiLong and RULER benchmarks for contexts up to 10M tokens.
-
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
HiPRAG adds hierarchical process rewards to RL training for agentic RAG, reducing over-search to 2.3% and achieving 65.4-67.2% accuracy on seven QA benchmarks across 3B and 7B models.
-
Retrieval from Within: An Intrinsic Capability of Attention-Based Models
Attention-based models can retrieve evidence intrinsically by using decoder attention to score and reuse their own pre-encoded chunks, outperforming separate retrieval pipelines on QA benchmarks.
-
Agentic Retrieval-Augmented Generation for Financial Document Question Answering
FinAgent-RAG achieves 76.81-78.46% execution accuracy on financial QA benchmarks by combining contrastive retrieval, program-of-thought code generation, and adaptive strategy routing, outperforming baselines by 5.62-9.32 points.
-
An Agentic Approach to Metadata Reasoning
Metadata Reasoner uses agentic LLM reasoning on metadata to select sufficient and minimal data sources, achieving 83.16% F1 on KramaBench and 85.5% F1 on noisy synthetic benchmarks while avoiding low-quality tables 99% of the time.
-
Agentic GraphRAG: Navigating Unstructured Financial Data with Collaborative AI
Agentic GraphRAG constructs a Neo4j graph via deterministic structured ingestion plus LLM extraction from notices, then deploys modular agents with tool access and reflection to outperform vector-RAG baselines on Swiss commercial gazette data across entity resolution, answer quality, and multi-turn
-
ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying
ADAM extracts data from LLM agent memory with up to 100% attack success rate by estimating data distribution and selecting queries via entropy guidance.
-
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
Evo-Memory is a new streaming benchmark and evaluation framework for self-evolving memory in LLM agents, unifying over ten memory modules and introducing the ReMem pipeline for continual improvement on multi-turn and reasoning datasets.
-
SpecHop: Continuous Speculation for Accelerating Multi-Hop Retrieval Agents
SpecHop accelerates multi-hop LLM tool use via continuous multi-threaded speculation with asynchronous verification, approaching oracle latency gains and reducing latency up to 40% on retrieval tasks.
-
GraphMind: From Operational Traces to Self-Evolving Workflow Automation
GraphMind builds and evolves action-centric workflow graphs from operational traces using offline extraction, online multi-agent traversal with LLMs, and Adaptive Traversal Reinforcement to improve automation in cloud database incident handling.
-
AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering
AMATA is an adaptive multi-agent trajectory alignment system that improves factual consistency in knowledge-intensive QA via intra-trajectory preference learning and inter-agent dependency optimization.
-
Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery
PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.
-
AgenticRAG: Agentic Retrieval for Enterprise Knowledge Bases
AgenticRAG equips an LLM with iterative retrieval and navigation tools, delivering 49.6% recall@1 on BRIGHT, 0.96 factuality on WixQA, and 92% correctness on FinanceBench.
-
When Agents Handle Secrets: A Survey of Confidential Computing for Agentic AI
A survey providing a taxonomy of TEE platforms, an agent-centric threat model, and open challenges for applying confidential computing to secure agentic AI systems.
-
SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms
SiriusHelper deploys an LLM agent with intent routing, DeepSearch multi-hop retrieval, and automated SOP distillation to outperform alternatives and reduce ticket volume by 20.8% on Tencent's big data platform.
-
Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines
QPP methods can select query variants that boost end-to-end RAG quality over the original query, though retrieval-optimized variants often fail to produce the best generated answers, revealing a utility gap.
-
Mind DeepResearch Technical Report
MindDR combines a Planning Agent, DeepSearch Agent, and Report Agent with SFT cold-start, Search-RL, Report-RL, and preference alignment to reach competitive scores on research benchmarks using 30B-scale models.
-
ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation
ALDEN boosts private data extraction rates from RAG systems by combining active learning for query diversification with dynamic estimation of the underlying knowledge-base topic distribution.
-
RELOOP: Recursive Retrieval with Multi-Hop Reasoner and Planners for Heterogeneous QA
RELOOP unifies retrieval across text, tables, and KGs via hierarchical sequences and dual-agent guided iteration, reporting EM/F1 gains over baselines on HotpotQA, HybridQA/TAT-QA, and MetaQA.
-
Improving Korean-English Cross-Lingual Retrieval: A Data-Centric Study of Language Composition and Model Merging
Language composition in training data creates opposing effects on CLIR and mono-IR performance for Korean-English retrieval, which model merging can partially resolve.
-
Adaptive ToR: Complexity-Aware Tree-Based Retrieval for Pareto-Optimal Multi-Intent NLU
Adaptive ToR uses a query complexity classifier to route multi-intent queries to either fast single-step or deeper hierarchical retrieval, improving accuracy by 9.7% and cutting latency by 37.6% on NLU benchmarks.
-
LARA: Validation-Driven Agentic Supercomputer Workflows for Atomistic Modeling
LARA-HPC introduces a validation-first agentic system with dry-run verification and multi-phase refinement that improves robustness of AI-generated DFT workflows on HPC systems.
-
Exploring Structural Complexity in Normative RAG with Graph-based approaches: A case study on the ETSI Standards
Graph RAG that embeds structural and lexical features from ETSI standards improves retrieval performance over vanilla vector methods on a custom Q&A dataset.
-
Multi-Dimensional Knowledge Profiling with Large-Scale Literature Database and Hierarchical Retrieval
Large-scale profiling of recent AI literature shows growth in safety, multimodal reasoning, and agent studies alongside stabilization in neural machine translation and graph methods.
-
Agentic Reasoning for Large Language Models
The survey structures agentic reasoning for LLMs into foundational, self-evolving, and collective multi-agent layers while distinguishing in-context orchestration from post-training optimization and reviewing applications across domains.
-
Exploiting Web Search Tools of AI Agents for Data Exfiltration
Indirect prompt injection attacks remain effective on LLMs using web search tools, allowing data exfiltration and exposing ongoing weaknesses in current model defenses.
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
-
Rethinking Agentic Reinforcement Learning In Large Language Models
The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.
-
Toward Agentic RAG for Ukrainian
Agentic RAG for Ukrainian improves answer accuracy via retries but is still limited by document and page retrieval quality.
-
PAL: Personal Adaptive Learner
PAL is an AI platform that converts lecture videos into real-time adaptive interactive learning with dynamic questions and tailored end-of-session summaries.
-
Automotive Engineering-Centric Agentic AI Workflow Framework
The paper presents the Agentic Engineering Intelligence (AEI) framework for modeling automotive engineering workflows as sequential decision processes with AI agent support.
- FreeRet: MLLMs as Training-Free Retrievers