SSP trains search agents without supervision by co-evolving a task proposer and solver through self-play, with RAG verification ensuring ground-truth accuracy, yielding uniform gains on benchmarks in both from-scratch and continued RL settings.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
Knowledge-graph paths reused as intermediate supervision improve self-evolving search agents over standard Search Self-Play on seven QA benchmarks by supplying relational context and graded waypoint rewards.
citing papers explorer
-
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
SSP trains search agents without supervision by co-evolving a task proposer and solver through self-play, with RAG verification ensuring ground-truth accuracy, yielding uniform gains on benchmarks in both from-scratch and continued RL settings.
-
Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents
Knowledge-graph paths reused as intermediate supervision improve self-evolving search agents over standard Search Self-Play on seven QA benchmarks by supplying relational context and graded waypoint rewards.