Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training

Bruno Martins; Chenyan Xiong; Jo\~ao Coelho; Jo\~ao Magalh\~aes

arxiv: 2606.10709 · v1 · pith:SGIXCAWFnew · submitted 2026-06-09 · 💻 cs.IR · cs.AI

Effective Reinforcement Learning for Agentic Search by Recycling Zero-Variance Queries During Training

Jo\~ao Coelho , Jo\~ao Magalh\~aes , Bruno Martins , Chenyan Xiong This is my paper

Pith reviewed 2026-06-27 11:36 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords reinforcement learningLLM agentsquery recyclingzero-variance queriesmulti-hop QAGRPOsynthetic data

0 comments

The pith

Recycling zero-variance queries during RL training enables a 1.7B model to match larger models on multi-hop QA tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that in outcome-reward reinforcement learning for LLM search agents, queries that yield all successes or all failures stop contributing to updates. It proposes that these zero-variance queries can regain variance as the model improves or drifts, so recycling them back into the training pool allows the data distribution to adapt. This method lets a 1.7 billion parameter model trained on synthetic data achieve 66.0 average Pass@1 across seven benchmarks, matching or beating systems up to 7B parameters that use benchmark supervision.

Core claim

Queries flip between zero-variance and signal-bearing states during training. Returning zero-variance groups to a mutable pool for future resampling makes the effective training distribution co-evolve with the policy, supplying roughly three quarters of the effective batch by the end of training through both recovery from improvement and handling of policy drift.

What carries the argument

Query recycling, the process of returning zero-variance rollout groups to the mutable training pool instead of discarding them.

If this is right

A 1.7B parameter model reaches 66.0 average Pass@1 on seven multi-hop QA benchmarks using only synthetic data.
Recycled queries account for about three quarters of the effective training batch by the end of training.
The contributions split between queries that recover variance after policy improvement and those affected by policy drift.
The approach works without relying on benchmark-derived supervision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar recycling could improve efficiency in other reinforcement learning settings where rollout costs are high and policies change over time.
The method might reduce the need for large amounts of curated training data in agent training.
Tracking which queries are recycled could reveal patterns in how search policies evolve on different query types.

Load-bearing premise

That zero-variance queries will later produce mixed outcomes when resampled after the policy has changed.

What would settle it

Training the same model without query recycling and observing whether it still reaches 66.0 average Pass@1 or falls short.

Figures

Figures reproduced from arXiv: 2606.10709 by Bruno Martins, Chenyan Xiong, Jo\~ao Coelho, Jo\~ao Magalh\~aes.

**Figure 2.** Figure 2: Distribution of the synthetic query pool ac [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative queries seen during training on [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Composition of the effective batch (i.e. signal [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Composition of the sampled batch (i.e. all [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Cumulative recycled queries on Qwen3-1.7B [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Average trajectory statistics during Qwen3-1.7B RL training under recycling, grouped by synthetic query [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Composition of sampled candidate queries throughout Qwen3-1.7B recycling-based training, stratified [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

The use of GRPO-style algorithms has become the standard strategy for training LLM search agents under outcome-only rewards. With these algorithms, a query contributes to parameter updates only when its rollout group mixes successes and failures; all-correct (too-easy) and all-incorrect (too-hard) groups are zero-variance and waste rollout cost. Existing approaches treat zero-variance as a static property and either discard or pre-filter such groups. We hypothesize and empirically validate that queries flip between zero-variance and signal-bearing states as the policy evolves during training. Building on this intuition, we propose query recycling, which returns zero-variance groups to a mutable pool for future resampling, so that the effective training distribution co-evolves with the policy. With the proposed technique, a 1.7B parameter model trained on synthetic data can reach 66.0 average Pass@1 accross seven multi-hop QA benchmarks, matching or surpassing systems with up to 7B parameters trained on benchmark-derived supervision. Analysis of recycling patterns shows that recycled queries supply roughly three quarters of the effective batch by the end of training, with contributions split between recovery from policy improvement and policy drift.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Query recycling in GRPO lets a 1.7B model hit 66.0 Pass@1 on seven benchmarks with synthetic data, but the gain may just reflect extra rollouts rather than the dynamic pool mechanism.

read the letter

The main thing here is that the paper introduces query recycling for GRPO training of search agents. Zero-variance rollout groups get returned to a mutable pool instead of being dropped, on the claim that queries flip between zero-variance and useful states as the policy changes. This produces a 1.7B model reaching 66.0 average Pass@1 across seven multi-hop QA benchmarks using only synthetic data.

The work does a decent job documenting the recycling behavior. Recycled queries end up supplying about three quarters of the effective batch by the end of training, with the contributions split between cases where the policy improves on a query and cases of policy drift. That observation is concrete and directly tied to their hypothesis.

The soft spot is the missing matched-compute control. Recycling zero-variance groups increases the total rollouts allocated to individual queries across epochs. Without a baseline that keeps the same total number of rollouts, the same synthetic generator, and the same GRPO settings but uses a static non-recyclable pool, the performance lift could come from simply sampling more on queries that later become informative. The abstract gives no seed variance numbers or pool-size ablations either, so the experimental detail is thin.

This is aimed at people training outcome-only RL agents for search or tool use who want to stretch small models without curated data. A reader trying to reduce wasted rollouts in GRPO setups would get a practical idea to test.

Send it for peer review. The core technique is simple enough that referees can check the controls and see whether the attribution holds.

Referee Report

2 major / 1 minor

Summary. The paper proposes query recycling for GRPO-style RL training of LLM search agents under outcome-only rewards. It hypothesizes that queries dynamically flip between zero-variance (all-correct or all-incorrect) and signal-bearing states as the policy evolves, and introduces a mutable pool that returns zero-variance groups for future resampling so the effective training distribution co-evolves with the policy. Empirically, a 1.7B model trained on synthetic data reaches 66.0 average Pass@1 across seven multi-hop QA benchmarks, matching or surpassing up to 7B models trained on benchmark-derived data; analysis indicates recycled queries supply ~75% of the effective batch by the end of training.

Significance. If the attribution to the recycling mechanism holds, the result would be significant for efficient RL in agentic search: it shows how to adaptively allocate rollout budget without discarding queries, enabling strong performance from smaller models on synthetic data. The reported recycling-pattern analysis is a concrete strength that could inform future work on dynamic training distributions.

major comments (2)

[Results section] Results section (headline 66.0 Pass@1 claim and recycling analysis): the manuscript reports no matched-compute control that fixes total GRPO rollouts while using a static (non-recyclable) query pool. Without this ablation the performance gain cannot be unambiguously attributed to the hypothesized state-flipping/co-evolution dynamic rather than simply allocating more samples to queries that later become informative; the ~75% recycled-batch statistic is consistent with both interpretations.
[Analysis of recycling patterns] Analysis of recycling patterns (three-quarters effective-batch claim): no details are provided on variance across random seeds, on how the 'effective batch' is precisely defined, or on an ablation varying the recycling-pool size; these omissions make the quantitative support for the co-evolution hypothesis difficult to evaluate.

minor comments (1)

[Abstract] Abstract: 'accross' is a typo and should read 'across'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, agreeing where revisions are needed to strengthen attribution and analysis.

read point-by-point responses

Referee: [Results section] Results section (headline 66.0 Pass@1 claim and recycling analysis): the manuscript reports no matched-compute control that fixes total GRPO rollouts while using a static (non-recyclable) query pool. Without this ablation the performance gain cannot be unambiguously attributed to the hypothesized state-flipping/co-evolution dynamic rather than simply allocating more samples to queries that later become informative; the ~75% recycled-batch statistic is consistent with both interpretations.

Authors: We agree that the current experiments do not include a matched-compute control with a fixed total rollout budget and static query pool, which limits unambiguous attribution to the co-evolution dynamic. In the revised manuscript we will add this ablation, comparing query recycling against a static-pool baseline under identical total GRPO rollouts, to better isolate the effect of dynamic resampling from simply reallocating samples to later-informative queries. revision: yes
Referee: [Analysis of recycling patterns] Analysis of recycling patterns (three-quarters effective-batch claim): no details are provided on variance across random seeds, on how the 'effective batch' is precisely defined, or on an ablation varying the recycling-pool size; these omissions make the quantitative support for the co-evolution hypothesis difficult to evaluate.

Authors: We will expand the recycling-pattern analysis in revision to (i) provide a precise definition of 'effective batch', (ii) report results with variance across multiple random seeds, and (iii) include an ablation on recycling-pool size. These additions will make the quantitative claims more robust and easier to evaluate. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical technique validated on benchmarks

full rationale

The paper proposes query recycling as an empirical training technique for GRPO-style RL on LLM search agents. The central result is a reported benchmark average of 66.0 Pass@1 for a 1.7B model on synthetic data, presented as an outcome of the method rather than a mathematical derivation. No equations, fitted parameters, or self-citations are shown to reduce the performance claim to a tautology or input by construction. The hypothesis about query variance states is stated as an intuition that is then empirically validated, with no load-bearing uniqueness theorem or ansatz imported from prior self-work. The work is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that GRPO-style group-relative policy optimization is the appropriate base algorithm and that the observed variance-flip behavior is general enough to justify the recycling pool. No free parameters or invented entities are described in the abstract.

axioms (1)

domain assumption GRPO-style algorithms are the standard strategy for training LLM search agents under outcome-only rewards
Stated in the opening sentence of the abstract as background for the zero-variance problem.

pith-pipeline@v0.9.1-grok · 5748 in / 1349 out tokens · 15539 ms · 2026-06-27T11:36:55.514644+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references

[1]

ArXiv , volume =

Bowen Jin and Hansi Zeng and Zhenrui Yue and Dong Wang and Hamed Zamani and Jiawei Han , title =. ArXiv , volume =
[2]

ArXiv , volume =

Bowen Jin and Jinsung Yoon and Priyanka Kargupta and Sercan. ArXiv , volume =
[3]

ArXiv , volume =

Huatong Song and Jinhao Jiang and Yingqian Min and Jie Chen and Zhipeng Chen and Wayne Xin Zhao and Lei Fang and Ji. ArXiv , volume =
[4]

Conference on Empirical Methods in Natural Language Processing

Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu , title =. Conference on Empirical Methods in Natural Language Processing
[5]

ArXiv , volume =

Tongyu Wen and Guanting Dong and Zhicheng Dou , title =. ArXiv , volume =
[6]

ArXiv , volume =

Shreyas Singh and Kunal Singh and Pradeep Moturi , title =. ArXiv , volume =
[7]

Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Xiao Bi and Haowei Zhang and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo , title =. ArXiv , volume =
[8]

Muennighoff, Niklas and Yang, Zitong and Shi, Weijia and others , journal =
[9]

Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , journal=
[10]

Anonymous , journal=
[11]

Annual Meeting of the Association for Computational Linguistics,

Jiahe Jin and Abhijay Paladugu and Chenyan Xiong , title =. Annual Meeting of the Association for Computational Linguistics,
[12]

International Conference on Learning Representations,

Yuxiang Ji and Ziyu Ma and Yong Wang and Guanhua Chen and Xiangxiang Chu and Liaoni Wu , title =. International Conference on Learning Representations,
[13]

ArXiv , volume =

Hao Sun and Zile Qiao and Jiayan Guo and Xuanbo Fan and Yingyan Hou and Yong Jiang and Pengjun Xie and Yan Zhang and Fei Huang and Jingren Zhou , title =. ArXiv , volume =
[14]

João Coelho and Jingjie Ning and Jingyuan He and Kangrui Mao and Abhijay Paladugu and Pranav Setlur and Jiahe Jin and Jamie Callan and João Magalhães and Bruno Martins and Chenyan Xiong , year=
[15]

ArXiv , volume =

Qiying Yu and others , title =. ArXiv , volume =
[16]

Long Phan and others , year=
[17]

Bartoldson and Bhavya Kailkhura and Fan Lai and Jiawei Zhao and Beidi Chen , title =

Haizhong Zheng and Yang Zhou and Brian R. Bartoldson and Bhavya Kailkhura and Fan Lai and Jiawei Zhao and Beidi Chen , title =
[18]

ArXiv , volume =

Thanh. ArXiv , volume =
[19]

ArXiv , volume =

Jiaxuan Gao and Wei Fu and Minyang Xie and Shusheng Xu and Chuyi He and Zhiyu Mei and Banghua Zhu and Yi Wu , title =. ArXiv , volume =
[20]

ArXiv , volume =

Hojae Han and Heeyun Jung and Jongyoon Kim and Seung. ArXiv , volume =
[21]

Kuan Li and Zhongwang Zhang and Huifeng Yin and Liwen Zhang and Litu Ou and Jialong Wu and Wenbiao Yin and Baixuan Li and Zhengwei Tao and Xinyu Wang and Weizhou Shen and Junkai Zhang and Dingchu Zhang and Xixi Wu and Yong Jiang and Ming Yan and Pengjun Xie and Fei Huang and Jingren Zhou , year=
[22]

Zhengwei Tao and Jialong Wu and Wenbiao Yin and Junkai Zhang and Baixuan Li and Haiyang Shen and Kuan Li and Liwen Zhang and Xinyu Wang and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , year=
[23]

Annual Meeting of the Association for Computational Linguistics

Aohan Zeng and Mingdao Liu and Rui Lu and Bowen Wang and Xiao Liu and Yuxiao Dong and Jie Tang , title =. Annual Meeting of the Association for Computational Linguistics
[24]

Jialong Wu and Baixuan Li and Runnan Fang and Wenbiao Yin and Liwen Zhang and Zhengwei Tao and Dingchu Zhang and Zekun Xi and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , title =
[25]

Jordan and Pieter Abbeel , title =

John Schulman and Philipp Moritz and Sergey Levine and Michael I. Jordan and Pieter Abbeel , title =. International Conference on Learning Representations
[26]

Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu , title =
[27]

ArXiv , volume =

Baixuan Li and others , title =. ArXiv , volume =
[28]

Junteng Liu and Yunji Li and Chi Zhang and Jingyang Li and Aili Chen and Ke Ji and Weiyu Cheng and Zijia Wu and Chengyu Du and Qidi Xu and Jiayuan Song and Zhengmao Zhu and Wenhu Chen and Pengyu Zhao and Junxian He , year=
[29]

ArXiv , volume =

An Yang and others , title =. ArXiv , volume =
[30]

Zile Qiao and Guoxin Chen and Xuanzhong Chen and Donglei Yu and Wenbiao Yin and Xinyu Wang and Zhen Zhang and Baixuan Li and Huifeng Yin and Kuan Li and Rui Min and Minpeng Liao and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , year=
[31]

ArXiv , volume =

Nandan Thakur and Zijian Chen and Xueguang Ma and Jimmy Lin , title =. ArXiv , volume =
[32]

Narasimhan and Yuan Cao , title =

Shunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik R. Narasimhan and Yuan Cao , title =. International Conference on Learning Representations
[33]

International Conference on Research and Development in Information Retrieval,

Arnold Overwijk and Chenyan Xiong and Jamie Callan , title =. International Conference on Research and Development in Information Retrieval,
[34]

Constructing

Xanh Ho and Anh. Constructing. International Conference on Computational Linguistics
[35]

Smith and Mike Lewis , title =

Ofir Press and Muru Zhang and Sewon Min and Ludwig Schmidt and Noah A. Smith and Mike Lewis , title =. Conference on Empirical Methods in Natural Language Processing
[36]

Cohen and Ruslan Salakhutdinov and Christopher D

Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning , title =. Conference on Empirical Methods in Natural Language Processing
[37]

Transactions of the Association of Computational Linguistics , volume =

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , title =. Transactions of the Association of Computational Linguistics , volume =
[38]

Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Jacob Devlin and Kenton Lee and Kristina Toutanova and Llion Jones and Matthew Kelcey and Ming

Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur P. Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Jacob Devlin and Kenton Lee and Kristina Toutanova and Llion Jones and Matthew Kelcey and Ming. Transactions of the Association of Computational Linguistics , volume =
[39]

Annual Meeting of the Association for Computational Linguistics,

Alex Mallen and Akari Asai and Victor Zhong and Rajarshi Das and Daniel Khashabi and Hannaneh Hajishirzi , title =. Annual Meeting of the Association for Computational Linguistics,
[40]

Weld and Luke Zettlemoyer , title =

Mandar Joshi and Eunsol Choi and Daniel S. Weld and Luke Zettlemoyer , title =. Annual Meeting of the Association for Computational Linguistics,
[41]

S imple D eep S earcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

Sun, Shuang and Song, Huatong and Wang, Yuhao and Ren, Ruiyang and Jiang, Jinhao and Zhang, Junjie and Bai, Fei and Deng, Jia and Zhao, Wayne Xin and Liu, Zheng and Fang, Lei and Wang, Zhongyuan and Wen, Ji-Rong. S imple D eep S earcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis. Conference on Empirical Methods in Natural Lan...

2025
[42]

International Conference on Learning Representations

Gr. International Conference on Learning Representations
[43]

Annual Meeting of the Association for Computational Linguistics,

Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Linhai Zhang and Yulan He and Deyu Zhou and Pengjun Xie and Fei Huang , title =. Annual Meeting of the Association for Computational Linguistics,
[44]

2026 , booktitle =

Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests , author=. 2026 , booktitle =

2026

[1] [1]

ArXiv , volume =

Bowen Jin and Hansi Zeng and Zhenrui Yue and Dong Wang and Hamed Zamani and Jiawei Han , title =. ArXiv , volume =

[2] [2]

ArXiv , volume =

Bowen Jin and Jinsung Yoon and Priyanka Kargupta and Sercan. ArXiv , volume =

[3] [3]

ArXiv , volume =

Huatong Song and Jinhao Jiang and Yingqian Min and Jie Chen and Zhipeng Chen and Wayne Xin Zhao and Lei Fang and Ji. ArXiv , volume =

[4] [4]

Conference on Empirical Methods in Natural Language Processing

Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu , title =. Conference on Empirical Methods in Natural Language Processing

[5] [5]

ArXiv , volume =

Tongyu Wen and Guanting Dong and Zhicheng Dou , title =. ArXiv , volume =

[6] [6]

ArXiv , volume =

Shreyas Singh and Kunal Singh and Pradeep Moturi , title =. ArXiv , volume =

[7] [7]

Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Xiao Bi and Haowei Zhang and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo , title =. ArXiv , volume =

[8] [8]

Muennighoff, Niklas and Yang, Zitong and Shi, Weijia and others , journal =

[9] [9]

Snell, Charlie and Lee, Jaehoon and Xu, Kelvin and Kumar, Aviral , journal=

[10] [10]

Anonymous , journal=

[11] [11]

Annual Meeting of the Association for Computational Linguistics,

Jiahe Jin and Abhijay Paladugu and Chenyan Xiong , title =. Annual Meeting of the Association for Computational Linguistics,

[12] [12]

International Conference on Learning Representations,

Yuxiang Ji and Ziyu Ma and Yong Wang and Guanhua Chen and Xiangxiang Chu and Liaoni Wu , title =. International Conference on Learning Representations,

[13] [13]

ArXiv , volume =

Hao Sun and Zile Qiao and Jiayan Guo and Xuanbo Fan and Yingyan Hou and Yong Jiang and Pengjun Xie and Yan Zhang and Fei Huang and Jingren Zhou , title =. ArXiv , volume =

[14] [14]

João Coelho and Jingjie Ning and Jingyuan He and Kangrui Mao and Abhijay Paladugu and Pranav Setlur and Jiahe Jin and Jamie Callan and João Magalhães and Bruno Martins and Chenyan Xiong , year=

[15] [15]

ArXiv , volume =

Qiying Yu and others , title =. ArXiv , volume =

[16] [16]

Long Phan and others , year=

[17] [17]

Bartoldson and Bhavya Kailkhura and Fan Lai and Jiawei Zhao and Beidi Chen , title =

Haizhong Zheng and Yang Zhou and Brian R. Bartoldson and Bhavya Kailkhura and Fan Lai and Jiawei Zhao and Beidi Chen , title =

[18] [18]

ArXiv , volume =

Thanh. ArXiv , volume =

[19] [19]

ArXiv , volume =

Jiaxuan Gao and Wei Fu and Minyang Xie and Shusheng Xu and Chuyi He and Zhiyu Mei and Banghua Zhu and Yi Wu , title =. ArXiv , volume =

[20] [20]

ArXiv , volume =

Hojae Han and Heeyun Jung and Jongyoon Kim and Seung. ArXiv , volume =

[21] [21]

Kuan Li and Zhongwang Zhang and Huifeng Yin and Liwen Zhang and Litu Ou and Jialong Wu and Wenbiao Yin and Baixuan Li and Zhengwei Tao and Xinyu Wang and Weizhou Shen and Junkai Zhang and Dingchu Zhang and Xixi Wu and Yong Jiang and Ming Yan and Pengjun Xie and Fei Huang and Jingren Zhou , year=

[22] [22]

Zhengwei Tao and Jialong Wu and Wenbiao Yin and Junkai Zhang and Baixuan Li and Haiyang Shen and Kuan Li and Liwen Zhang and Xinyu Wang and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , year=

[23] [23]

Annual Meeting of the Association for Computational Linguistics

Aohan Zeng and Mingdao Liu and Rui Lu and Bowen Wang and Xiao Liu and Yuxiao Dong and Jie Tang , title =. Annual Meeting of the Association for Computational Linguistics

[24] [24]

Jialong Wu and Baixuan Li and Runnan Fang and Wenbiao Yin and Liwen Zhang and Zhengwei Tao and Dingchu Zhang and Zekun Xi and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , title =

[25] [25]

Jordan and Pieter Abbeel , title =

John Schulman and Philipp Moritz and Sergey Levine and Michael I. Jordan and Pieter Abbeel , title =. International Conference on Learning Representations

[26] [26]

Wei Fu and Jiaxuan Gao and Xujie Shen and Chen Zhu and Zhiyu Mei and Chuyi He and Shusheng Xu and Guo Wei and Jun Mei and Jiashu Wang and Tongkai Yang and Binhang Yuan and Yi Wu , title =

[27] [27]

ArXiv , volume =

Baixuan Li and others , title =. ArXiv , volume =

[28] [28]

Junteng Liu and Yunji Li and Chi Zhang and Jingyang Li and Aili Chen and Ke Ji and Weiyu Cheng and Zijia Wu and Chengyu Du and Qidi Xu and Jiayuan Song and Zhengmao Zhu and Wenhu Chen and Pengyu Zhao and Junxian He , year=

[29] [29]

ArXiv , volume =

An Yang and others , title =. ArXiv , volume =

[30] [30]

Zile Qiao and Guoxin Chen and Xuanzhong Chen and Donglei Yu and Wenbiao Yin and Xinyu Wang and Zhen Zhang and Baixuan Li and Huifeng Yin and Kuan Li and Rui Min and Minpeng Liao and Yong Jiang and Pengjun Xie and Fei Huang and Jingren Zhou , year=

[31] [31]

ArXiv , volume =

Nandan Thakur and Zijian Chen and Xueguang Ma and Jimmy Lin , title =. ArXiv , volume =

[32] [32]

Narasimhan and Yuan Cao , title =

Shunyu Yao and Jeffrey Zhao and Dian Yu and Nan Du and Izhak Shafran and Karthik R. Narasimhan and Yuan Cao , title =. International Conference on Learning Representations

[33] [33]

International Conference on Research and Development in Information Retrieval,

Arnold Overwijk and Chenyan Xiong and Jamie Callan , title =. International Conference on Research and Development in Information Retrieval,

[34] [34]

Constructing

Xanh Ho and Anh. Constructing. International Conference on Computational Linguistics

[35] [35]

Smith and Mike Lewis , title =

Ofir Press and Muru Zhang and Sewon Min and Ludwig Schmidt and Noah A. Smith and Mike Lewis , title =. Conference on Empirical Methods in Natural Language Processing

[36] [36]

Cohen and Ruslan Salakhutdinov and Christopher D

Zhilin Yang and Peng Qi and Saizheng Zhang and Yoshua Bengio and William W. Cohen and Ruslan Salakhutdinov and Christopher D. Manning , title =. Conference on Empirical Methods in Natural Language Processing

[37] [37]

Transactions of the Association of Computational Linguistics , volume =

Harsh Trivedi and Niranjan Balasubramanian and Tushar Khot and Ashish Sabharwal , title =. Transactions of the Association of Computational Linguistics , volume =

[38] [38]

Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Jacob Devlin and Kenton Lee and Kristina Toutanova and Llion Jones and Matthew Kelcey and Ming

Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur P. Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Jacob Devlin and Kenton Lee and Kristina Toutanova and Llion Jones and Matthew Kelcey and Ming. Transactions of the Association of Computational Linguistics , volume =

[39] [39]

Annual Meeting of the Association for Computational Linguistics,

Alex Mallen and Akari Asai and Victor Zhong and Rajarshi Das and Daniel Khashabi and Hannaneh Hajishirzi , title =. Annual Meeting of the Association for Computational Linguistics,

[40] [40]

Weld and Luke Zettlemoyer , title =

Mandar Joshi and Eunsol Choi and Daniel S. Weld and Luke Zettlemoyer , title =. Annual Meeting of the Association for Computational Linguistics,

[41] [41]

S imple D eep S earcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

Sun, Shuang and Song, Huatong and Wang, Yuhao and Ren, Ruiyang and Jiang, Jinhao and Zhang, Junjie and Bai, Fei and Deng, Jia and Zhao, Wayne Xin and Liu, Zheng and Fang, Lei and Wang, Zhongyuan and Wen, Ji-Rong. S imple D eep S earcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis. Conference on Empirical Methods in Natural Lan...

2025

[42] [42]

International Conference on Learning Representations

Gr. International Conference on Learning Representations

[43] [43]

Annual Meeting of the Association for Computational Linguistics,

Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Linhai Zhang and Yulan He and Deyu Zhou and Pengjun Xie and Fei Huang , title =. Annual Meeting of the Association for Computational Linguistics,

[44] [44]

2026 , booktitle =

Agentic Search in the Wild: Intents and Trajectory Dynamics from 14M+ Real Search Requests , author=. 2026 , booktitle =

2026