HiPRAG adds hierarchical process rewards to RL training for agentic RAG, reducing over-search to 2.3% and achieving 65.4-67.2% accuracy on seven QA benchmarks across 3B and 7B models.
2501.04899v1 , archivePrefix =
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
SAAS applies RL with boundary modeling via rollout contrasts, boundary-aware rewards, and staged optimization to reduce over-search in agentic LLMs while preserving accuracy.
Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.
citing papers explorer
-
HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation
HiPRAG adds hierarchical process rewards to RL training for agentic RAG, reducing over-search to 2.3% and achieving 65.4-67.2% accuracy on seven QA benchmarks across 3B and 7B models.
-
Hybrid Adversarial Defence for Natural Language Understanding Tasks
Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.