pith. sign in

arxiv: 2508.19113 · v3 · pith:OYBNBUGQnew · submitted 2025-08-26 · 💻 cs.AI

Hybrid Deep Searcher: Scalable Parallel and Sequential Search Reasoning

classification 💻 cs.AI
keywords searchreasoningparallelsequentialaggregationstructuredapproachesdeep
0
0 comments X
read the original abstract

Large reasoning models (LRMs) combined with retrieval-augmented generation (RAG) have enabled deep research agents capable of multi-step reasoning with external knowledge retrieval. However, we find that existing approaches rarely demonstrate test-time search scaling. Methods that extend reasoning through single-query sequential search suffer from limited evidence coverage, while approaches that generate multiple independent queries per step often lack structured aggregation, hindering deeper sequential reasoning. We propose a hybrid search strategy to address these limitations. We introduce HybridDeepSearcher, a structured search agent that integrates parallel query expansion with explicit evidence aggregation before advancing to deeper sequential reasoning. To supervise this behavior, we introduce HDS-QA, a novel dataset that guides models to combine broad parallel search with structured aggregation through supervised reasoning-query0retrieval trajectories containing parallel sub-queries. Across five benchmarks, HybridDeepSearcher significantly outperforms the state-of-the-art, improving F1 scores by +15.9 on FanOutQA and +9.2 on a subset of BrowseComp. Further analysis shows its consistent test-time search scaling: performance improves as additional search turns or calls are allowed, while competing methods plateau.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

    cs.CL 2025-10 unverdicted novelty 5.0

    ERL trains LLMs to erase faulty reasoning steps and regenerate them in place, yielding gains of up to 8.48% EM on multi-hop QA benchmarks like HotpotQA.