Our experiments are built upon the internalverlreinforcement learning framework and executed on a cluster equipped with Huawei Ascend NPUs

A APPENDIX A · 2048

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards

cs.CL · 2025-10-01 · unverdicted · novelty 6.0

ReSeek adds self-correction via a JUDGE action and a dense instructive reward (correctness plus utility) to RL training of search agents, yielding higher success and faithfulness on a new contamination-resistant benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

ReSeek: A Self-Correcting Framework for Search Agents with Instructive Rewards cs.CL · 2025-10-01 · unverdicted · none · ref 25
ReSeek adds self-correction via a JUDGE action and a dense instructive reward (correctness plus utility) to RL training of search agents, yielding higher success and faithfulness on a new contamination-resistant benchmark.

Our experiments are built upon the internalverlreinforcement learning framework and executed on a cluster equipped with Huawei Ascend NPUs

fields

years

verdicts

representative citing papers

citing papers explorer