HiPRAG adds hierarchical process rewards to RL training for agentic RAG, reducing over-search to 2.3% and achieving 65.4-67.2% accuracy on seven QA benchmarks across 3B and 7B models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
SAAS applies RL with boundary modeling via rollout contrasts, boundary-aware rewards, and staged optimization to reduce over-search in agentic LLMs while preserving accuracy.
citing papers explorer
-
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
HiPRAG adds hierarchical process rewards to RL training for agentic RAG, reducing over-search to 2.3% and achieving 65.4-67.2% accuracy on seven QA benchmarks across 3B and 7B models.
-
SAAS: Self-Aware Reinforcement Learning for Over-Search Mitigation in Agentic Search
SAAS applies RL with boundary modeling via rollout contrasts, boundary-aware rewards, and staged optimization to reduce over-search in agentic LLMs while preserving accuracy.