SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
Pith reviewed 2026-06-27 16:20 UTC · model grok-4.3
The pith
A harness generates training trajectories that teach models when and how to delegate subtasks, producing the strongest results among 30B-scale models on deep research benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A harness that guides the main agent through task decomposition and constrains subagents to return properly formatted summaries produces trajectories that encode correct delegation decisions; supervised fine-tuning on these trajectories internalizes delegation intelligence into the model weights, enabling the 30B model to achieve state-of-the-art scores on long-horizon research benchmarks.
What carries the argument
The harness that guides task decomposition, enforces delegation points, and requires subagents to return concise results that conserve the main agent's context budget.
If this is right
- Models can sustain workflows whose total context demand grows without bound.
- Delegation decisions move from prompt design into learned model behavior.
- Open release of harness and trajectories lets the community scale data collection for this skill.
Where Pith is reading between the lines
- The harness method could be adapted to generate delegation data for domains such as codebases or experimental workflows.
- If the learned behavior generalizes, future agent systems might need smaller context windows than direct long-context approaches.
- Iterative self-application of the trained model inside the harness could produce higher-quality trajectories without additional human design.
Load-bearing premise
Trajectories produced inside the constrained harness encode delegation decisions that still work when the model faces open-ended tasks without harness guidance.
What would settle it
Test the fine-tuned model on a set of research problems whose required decomposition and delegation steps were never present in the harness data; if accuracy falls below untuned baselines, the generalization claim is falsified.
read the original abstract
Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent's workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use as supervised fine-tuning data to internalize delegation intelligence into model weights. Our resulting model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SearchSwarm, a preliminary method for acquiring delegation intelligence in agentic LLMs for long-horizon deep research. A harness guides task decomposition and constrains subagent returns to produce trajectories that are used as supervised fine-tuning data; the resulting SearchSwarm-30B-A3B model reports 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, stated as the best results among models of comparable scale. The authors note the scarcity of natural training data for delegation and commit to releasing the harness, model weights, and training data.
Significance. If the central claim holds, the work would supply a concrete, open-source recipe for synthesizing delegation trajectories at scale, addressing a recognized bottleneck for long-horizon agentic systems. The planned release of harness, weights, and data constitutes a concrete community contribution that would support reproducibility and follow-on experiments.
major comments (1)
- [Abstract] Abstract, paragraph on harness-guided trajectories: the assertion that these trajectories 'naturally encode correct delegation decisions' which SFT then internalizes for generalization to unconstrained open-ended tasks is load-bearing for the central claim, yet the manuscript supplies no ablation comparing harness-on versus harness-off inference, no description of harness removal at test time, and no held-out evaluation of delegation quality outside the BrowseComp harness setting. This omission leaves open whether reported gains reflect learned delegation intelligence or continued harness effects.
Simulated Author's Rebuttal
We thank the referee for this constructive comment, which correctly identifies a key evidentiary gap in supporting the central claim of internalized delegation intelligence. We address the point directly below and commit to revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract, paragraph on harness-guided trajectories: the assertion that these trajectories 'naturally encode correct delegation decisions' which SFT then internalizes for generalization to unconstrained open-ended tasks is load-bearing for the central claim, yet the manuscript supplies no ablation comparing harness-on versus harness-off inference, no description of harness removal at test time, and no held-out evaluation of delegation quality outside the BrowseComp harness setting. This omission leaves open whether reported gains reflect learned delegation intelligence or continued harness effects.
Authors: We agree the manuscript requires clarification on this point to substantiate the claim. The harness is employed exclusively during trajectory synthesis to produce high-quality SFT data; at inference the fine-tuned model is intended to operate without it. In the revised manuscript we will add: (1) an explicit section describing harness removal at test time and the unconstrained inference protocol; (2) an ablation comparing harness-on versus harness-off performance on a representative subset of BrowseComp tasks to isolate the contribution of learned delegation; and (3) a limitations discussion noting the current reliance on BrowseComp as the primary held-out benchmark while outlining plans for additional delegation-specific metrics. These changes will directly address whether observed gains derive from internalized capabilities rather than residual harness effects. revision: yes
Circularity Check
No circularity: empirical SFT on harness trajectories
full rationale
The paper describes an empirical pipeline: a harness generates trajectories that are then used as SFT data to train delegation behavior. No equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations appear in the provided text. The central claim (benchmark gains after SFT) is a measured outcome rather than a quantity forced by construction from the inputs. Generalization from harness to open-ended use is an unproven assumption but does not constitute circularity under the defined patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Supervised fine-tuning on trajectories generated under harness constraints will produce generalization to unconstrained tasks
Reference graph
Works this paper leans on
-
[1]
Second M1 funding locked in as part of economic recovery to create jobs https://statements.qld.gov.au/statements/908 28
-
[2]
First contract awarded for $1.53bn QLD Coomera Connector Stage 1 https://www.felix.net/project-news/first-contr act-awarded-for-1.53bn-qld-coomera-connector-stage-1
-
[3]
1093 (PDF) https://documents.parliament.qld.gov.au/tableoffice/questionsanswers /2021/1093-2021.pdf
Question on Notice No. 1093 (PDF) https://documents.parliament.qld.gov.au/tableoffice/questionsanswers /2021/1093-2021.pdf
2021
-
[4]
Coomera Connector Stage 1 North opens to traffic https://www.infrastructure.gov.au/department/media/news/co omera-connector-stage-1-north-opens-traffic
-
[5]
$3.4 billion Coomera Connector stage one to open after construction delays https://www.abc.net.au/news/2025-12-01/fi rst-stage-of-gold-coast-coomera-connector-to-open-to-motorists/106085710
arXiv 2025
-
[6]
Coomera Connector – Wikipediahttps://en.wikipedia.org/wiki/Coomera_Connector
-
[7]
Coomera Connector - Stage One - Central - Infrastructure Pipeline https://infrastructurepipeline.org/project/coome ra-connector-stage-one-central
-
[8]
INLink celebrates official commencement of Inland Rail project https://www.bmdgroup.global/news/inlink-celebrate s-official-commencement-of-inland-rail-project
-
[9]
Inland Rail construction begins (Senator’s media release)https://ministers.finance.gov.au/financeminister/media -release/2018/12/13/inland-rail-construction-begins(search snippet)
2018
-
[10]
Inland Rail Section 5: Parkes to Narromine (P2N) - Fulton Hogan https://www.fultonhogan.com/keyprojects/inland-r ail-section-5-parkes-to-narromine-p2n/
-
[11]
Parkes to Narromine Inland Rail complete - ARTC https://www.artc.com.au/2020/09/15/parkes-to-narromine-i nland-rail-complete/(search snippet)
2020
-
[12]
RTI Release – TMR Queensland https://www.tmr.qld.gov.au/_/media/aboutus/rti/disclog/2020/r_rti-100 3-release.pdf(search snippet)
2020
-
[13]
Name revealed for new $3.5 billion Gold Coast motorway Big Rigs https://bigrigs.com.au/2025/08/27/name-reveale d-for-new-3-5-billion-gold-coast-motorway/(search snippet)
2025
-
[14]
M12 Motorway (Sydney) – Wikipediahttps://en.wikipedia.org/wiki/M12_Motorway_(Sydney)(search snippet)
-
[15]
Northbound lanes open for first time on $2.2 billion Coffs Harbour bypass https://bigrigs.com.au/2025/05/02/northbou nd-lanes-open-for-first-time-on-2-2-billion-coffs-harbour-bypass/(search snippet)
2025
-
[16]
West Gate Tunnel Project Victoria’s Big Buildhttps://bigbuild.vic.gov.au/projects/west-gate-tunnel-project (search snippet) 25
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.