Search and refine during think: Facilitating knowledge refinement for improved retrieval-augmented reasoning

Yaorui Shi, Sihang Li, Chang Wu, Zhiyuan Liu, Junfeng Fang, Hengxing Cai, An Zhang, Xiang Wang · 2025 · arXiv 2505.11277

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Latent Abstraction for Retrieval-Augmented Generation

cs.CL · 2026-04-20 · unverdicted · novelty 7.0

LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

cs.CL · 2025-11-04 · unverdicted · novelty 7.0

MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.

MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search

cs.IR · 2026-04-19 · unverdicted · novelty 6.0

MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle

cs.CL · 2025-10-17 · unverdicted · novelty 6.0 · 2 refs

EvolveR enables LLM agents to self-evolve via a closed loop of distilling interaction trajectories into strategic principles offline and retrieving them to guide online decisions with policy reinforcement, yielding better results on multi-hop QA benchmarks.

Learning to Trust: Dynamic Utilization of Retrieval-Augmented Generation for E-commerce Search Relevance

cs.IR · 2025-10-13 · unverdicted · novelty 6.0

DyKnow-RAG uses Group Relative Policy Optimization with dual-group rollouts and posterior-driven advantage scaling to optimize context utilization in RAG for e-commerce relevance, showing offline gains and production lifts when deployed at Taobao.

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

cs.AI · 2025-09-02 · accept · novelty 6.0

Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

citing papers explorer

Showing 6 of 6 citing papers.

Latent Abstraction for Retrieval-Augmented Generation cs.CL · 2026-04-20 · unverdicted · none · ref 33
LAnR unifies retrieval-augmented generation inside a single LLM by deriving dense retrieval vectors from a [PRED] token's hidden states and using entropy to adaptively stop retrieval, outperforming prior RAG on six QA benchmarks with better efficiency.
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning cs.CL · 2025-11-04 · unverdicted · none · ref 21
MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
MemSearch-o1: Empowering Large Language Models with Reasoning-Aligned Memory Growth in Agentic Search cs.IR · 2026-04-19 · unverdicted · none · ref 22
MemSearch-o1 mitigates memory dilution in agentic LLM search through reasoning-aligned token-level memory growth, retracing with a contribution function, and path reorganization, improving reasoning activation on benchmarks.
EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle cs.CL · 2025-10-17 · unverdicted · none · ref 25 · 2 links
EvolveR enables LLM agents to self-evolve via a closed loop of distilling interaction trajectories into strategic principles offline and retrieving them to guide online decisions with policy reinforcement, yielding better results on multi-hop QA benchmarks.
Learning to Trust: Dynamic Utilization of Retrieval-Augmented Generation for E-commerce Search Relevance cs.IR · 2025-10-13 · unverdicted · none · ref 19
DyKnow-RAG uses Group Relative Policy Optimization with dual-group rollouts and posterior-driven advantage scaling to optimize context utilization in RAG for e-commerce relevance, showing offline gains and production lifts when deployed at Taobao.
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey cs.AI · 2025-09-02 · accept · none · ref 296
Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.

Search and refine during think: Facilitating knowledge refinement for improved retrieval-augmented reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer