arxiv: 2605.11732 · v1 · submitted 2026-05-12 · 💻 cs.IR · cs.CL· cs.MA· cs.MM

Recognition: 2 theorem links

· Lean Theorem

AgentDisCo: Towards Disentanglement and Collaboration in Open-ended Deep Research Agents

Jiarui Jin , Zexuan Yan , Shijian Wang , Wenxiang Jiao , Yuan Lu

Authors on Pith no claims yet

Pith reviewed 2026-05-13 05:33 UTC · model grok-4.3

classification 💻 cs.IR cs.CLcs.MAcs.MM

keywords disentangled agentscollaborative agentsdeep researchadversarial optimizationmeta-optimizationpolicy bankresearch benchmarksagentic architecture

0 comments

The pith

AgentDisCo separates exploration and exploitation into critic and generator agents that iteratively refine outlines before a report writer synthesizes them.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that deep research improves when the conflicting goals of gathering new information and using what is already known are handled by two separate agents rather than one combined module. A critic agent scores the current outline and sharpens the search queries, while a generator agent pulls fresh results and rewrites the outline in reply. These exchanges continue until the outline stabilizes, at which point a downstream writer produces the finished report. The architecture also includes a meta-optimization step that lets the agents evaluate each other’s outputs and automatically build a reusable collection of effective design strategies. Tests on existing benchmarks reach or exceed current leading systems, and the work adds a new benchmark drawn from real browsing histories plus a module that turns reports into visual posters.

Core claim

AgentDisCo formulates deep research as an adversarial optimization problem between information exploration and exploitation. It employs a critic agent to evaluate generated outlines and refine search queries and a generator agent to retrieve updated results and revise outlines accordingly. The iteratively refined outline is passed to a downstream report writer that synthesizes a comprehensive research report. The workflow supports both handcrafted and automatically discovered design strategies via a meta-optimization harness in which the generator is repurposed as a scoring agent to evaluate critic outputs and generate quality signals, enabling construction of a policy bank of reusable agent

What carries the argument

The critic-generator adversarial loop that alternates outline evaluation and revision, supported by a meta-optimization harness that builds a policy bank of design strategies.

Load-bearing premise

That splitting exploration and exploitation into separate agents and letting them optimize against each other produces better research outlines than handling both tasks inside a single module.

What would settle it

A direct comparison on the same benchmarks in which an otherwise identical single-module agent produces outlines and reports of equal or higher measured quality.

Figures

Figures reproduced from arXiv: 2605.11732 by Jiarui Jin, Shijian Wang, Wenxiang Jiao, Yuan Lu, Zexuan Yan.

**Figure 1.** Figure 1: Comparison of deep research paradigms. (a) the outline-iterative-optimization paradigm couples outline generation and search query formulation within a single model; (b) the report-iterativeoptimization paradigm similarly entangles report generation with search query formulation; (c) in contrast, AgentDisCo disentangles the outline generator and the search query generator into separate models, and further… view at source ↗

**Figure 2.** Figure 2: Overview of the architecture and applications of AgentDisCo. AgentDisCo spans the full pipeline from mining latent deep research queries in user interaction histories to producing structured reports and rendering visually rich posters. This end-to-end design realizes the vision of “AutoResearch Your Interest”—automatically tracking evolving user interests and delivering personalized deep research recommend… view at source ↗

**Figure 3.** Figure 3: Overview of the harness optimization in AgentDisCo. AgentDisCo can automatically discover design strategies by constructing a meta-optimization harness around the adversarial optimization loop. Specifically, the generator agent—originally tasked with producing target outlines—is repurposed as a scoring agent that evaluates the critic agent’s outputs and generates quality signals, thereby enabling systemati… view at source ↗

**Figure 4.** Figure 4: Overview of the render agent in AgentDisCo. Our render agent accepts as input a report in either PDF or Markdown format. It first extracts the salient features and structural elements from the report, and then reorganizes the content into one of two presentation modalities: an HTML-based layout or a slide-style layout. Notably, both modalities support pluggable templates and styling components, enabling fl… view at source ↗

**Figure 5.** Figure 5: A showcase of AgentDisCo. The figure illustrates the end-to-end processing pipeline of an input query: starting from the planner agent, proceeding through the iterative optimization loop between the (outline) critic agent and the (outline) generator agent, and finally passing to the writer agent and the render agent. 2.5 Render Agent Considering that a clear, intuitive, and visually engaging interface is e… view at source ↗

**Figure 6.** Figure 6: Comparison between our proposed GALA benchmark with existing benchmarks DeepResearchBench, DeepResearchGym, DeepConsult. . 2.6 A Running Example To illustrate the end-to-end workflow of AgentDisCo, we present a real execution trace in [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: End-to-end scores with varying rounds of [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 9.** Figure 9: Consistency of harness optimization over critic agent to end-to-end optimization with varying [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Gallery of diverse template styles and types, including slides, posters, and portrait-format images. Our render agent offers extensive stylistic choices, accommodating diverse user preferences. { "evaluation": { "completeness": { "score": 0, "reasoning": "explanation of scoring rationale" }, "diversity": { "score": 0, "reasoning": "explanation of scoring rationale" }, "search_coverage": { "score": 0, "rea… view at source ↗

read the original abstract

In this paper, we present AgentDisCo, a novel Disentangled and Collaborative agentic architecture that formulates deep research as an adversarial optimization problem between information exploration and exploitation. Unlike existing approaches that conflate these two processes into a single module, AgentDisCo employs a critic agent to evaluate generated outlines and refine search queries, and a generator agent to retrieve updated results and revise outlines accordingly. The iteratively refined outline is then passed to a downstream report writer that synthesizes a comprehensive research report. The overall workflow supports both handcrafted and automatically discovered design strategies via a meta-optimization harness, in which the generator agent is repurposed as a scoring agent to evaluate critic outputs and generate quality signals. Powerful code-generation agents (e.g., Claude-Code, Codex) systematically explore agent configurations and construct a policy bank, a structured repository of reusable design strategies, enabling the framework to self-refine without extensive human intervention. We evaluate AgentDisCo on three established deep research benchmarks (DeepResearchBench, DeepConsult, DeepResearchGym) using Gemini-2.5-Pro, achieving performance comparable to or surpassing leading closed-source systems. Observing that existing benchmarks inadequately reflect real-world user needs, we introduce GALA (General AI Life Assistants), a benchmark that mines latent research interests from users' historical browsing behavior. We further develop a rendering agent that converts research reports into visually rich poster presentations, and demonstrate an end-to-end product, AutoResearch Your Interest, which delivers personalized deep research recommendations derived from individual browsing histories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AgentDisCo's critic-generator split and GALA benchmark from browsing data are clear new pieces, but the paper never isolates whether the split actually beats a single iterative agent.

read the letter

The main thing here is the explicit split between a critic agent that scores outlines and tightens queries, and a generator that fetches fresh results and rewrites the outline, plus a meta-optimization step that turns code agents loose to build a reusable policy bank. They also add GALA, a benchmark pulled from real user browsing histories, and show the whole thing feeding into a report writer and a poster renderer. That framing and the benchmark construction are the concrete additions over prior single-module research agents. The workflow description is straightforward and the end-to-end product demo gives a sense of how it could be used for personalized research from browsing logs. The authors lay out the pieces without obvious internal contradictions. The evaluation runs on three existing benchmarks and claims results on par with or better than closed-source systems using Gemini-2.5-Pro. The stress-test concern holds up: there are no controlled comparisons that keep iteration count and model fixed while removing the critic-generator separation or the meta-harness. Without those, the performance numbers do not tell us whether the disentanglement is doing the work or whether any iterative loop on the same base model would suffice. The GALA benchmark is a useful addition for realism but does not close that gap. This paper is for people already working on multi-agent setups for long-horizon tasks. A reader looking for modular designs or user-derived benchmarks can extract usable ideas even if the superiority claim stays untested. The thinking on the architecture is coherent. I would send it for peer review so the authors can add the missing single-agent baselines and ablations; the ideas are worth that level of scrutiny.

Referee Report

3 major / 2 minor

Summary. The paper introduces AgentDisCo, a multi-agent architecture that disentangles deep research into a critic agent (for outline evaluation and query refinement) and a generator agent (for result retrieval and outline revision) framed as adversarial optimization between exploration and exploitation. The refined outline feeds a report writer; a meta-optimization harness repurposes the generator to score critic outputs and discover strategies via code-generation agents that populate a policy bank. The system is evaluated on DeepResearchBench, DeepConsult, and DeepResearchGym (claiming parity or superiority to closed-source models with Gemini-2.5-Pro), introduces the GALA benchmark derived from user browsing histories, and includes a rendering agent for poster presentations plus an end-to-end product demo.

Significance. If the central claims hold, the work would demonstrate that explicit separation of exploration/exploitation roles plus automated strategy discovery can improve open-ended research agents over monolithic designs, with the GALA benchmark and policy bank offering reusable contributions for personalized research systems. The meta-optimization approach and end-to-end product are concrete strengths that could influence practical agentic workflows.

major comments (3)

[Experiments / Evaluation] The experimental evaluation (presumably §4 or §5) reports performance only for the full AgentDisCo system against closed-source baselines but provides no ablation or controlled comparison to a single-module baseline that merges critic and generator roles into one iterative agent while holding model, iteration count, and prompting fixed. This directly undermines the central claim that the disentangled adversarial formulation yields superior outlines and reports.
[Experiments / Evaluation] No quantitative breakdown (tables, error bars, or statistical tests) isolates the contribution of the meta-optimization harness and policy bank versus standard iterative prompting with the same base model. The abstract states competitive results on three benchmarks, yet without these controls it remains unclear whether gains derive from the proposed architecture or simply from Gemini-2.5-Pro plus iteration.
[Benchmark / GALA] The GALA benchmark is introduced as addressing limitations of prior benchmarks, but the manuscript does not report inter-annotator agreement, coverage statistics, or a direct comparison showing that GALA better correlates with real-world user satisfaction than DeepResearchBench et al.

minor comments (2)

[Architecture] Notation for the adversarial loop (critic-generator interaction) and the meta-optimization scoring function should be formalized with equations or pseudocode for reproducibility.
[Abstract / Experiments] The abstract claims 'comparable to or surpassing' closed-source systems; the main text should include the exact metrics, model versions, and run counts used for each benchmark.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening our experimental validation and benchmark justification. We address each major comment below and have revised the manuscript to incorporate the requested analyses.

read point-by-point responses

Referee: [Experiments / Evaluation] The experimental evaluation (presumably §4 or §5) reports performance only for the full AgentDisCo system against closed-source baselines but provides no ablation or controlled comparison to a single-module baseline that merges critic and generator roles into one iterative agent while holding model, iteration count, and prompting fixed. This directly undermines the central claim that the disentangled adversarial formulation yields superior outlines and reports.

Authors: We agree that a controlled ablation against a merged single-agent baseline is essential to substantiate the value of disentanglement. In the revised manuscript, we have added this comparison in Section 4, implementing a single iterative agent that combines critic and generator roles while fixing the base model (Gemini-2.5-Pro), iteration count, and prompting. Results show measurable gains in outline quality and report scores for the disentangled design, directly supporting our central claim. These findings are summarized in a new table. revision: yes
Referee: [Experiments / Evaluation] No quantitative breakdown (tables, error bars, or statistical tests) isolates the contribution of the meta-optimization harness and policy bank versus standard iterative prompting with the same base model. The abstract states competitive results on three benchmarks, yet without these controls it remains unclear whether gains derive from the proposed architecture or simply from Gemini-2.5-Pro plus iteration.

Authors: We acknowledge the need to isolate these contributions. The revised version includes a quantitative breakdown in Section 4.3 with tables comparing the full AgentDisCo system, a version without the policy bank, and a standard iterative prompting baseline using the identical model. We report error bars across runs and include statistical tests (e.g., paired t-tests) confirming the incremental benefits of meta-optimization and the policy bank. revision: yes
Referee: [Benchmark / GALA] The GALA benchmark is introduced as addressing limitations of prior benchmarks, but the manuscript does not report inter-annotator agreement, coverage statistics, or a direct comparison showing that GALA better correlates with real-world user satisfaction than DeepResearchBench et al.

Authors: We appreciate this request for additional validation of GALA. In the revision, we have incorporated inter-annotator agreement metrics, coverage statistics on the mined browsing interests, and a comparative user study demonstrating stronger correlation with real-world satisfaction than prior benchmarks. These details are now reported in Section 5 to better justify GALA's introduction. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and benchmarks are independent design and evaluation choices

full rationale

The paper introduces AgentDisCo as an explicit architectural proposal that separates critic (exploration/refinement) and generator (exploitation/revision) agents, then evaluates the full system empirically on DeepResearchBench, DeepConsult, DeepResearchGym and the new GALA benchmark using Gemini-2.5-Pro. No equations, fitted parameters, or self-citations appear in the provided text that would reduce claimed performance to inputs by construction. The meta-optimization harness and policy bank are presented as methodological contributions rather than tautological re-labelings of results. The derivation chain therefore consists of design decisions followed by external benchmark measurements and remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

Based on abstract only, the central claim rests on the assumption that role separation improves outcomes and that automatic strategy discovery via code agents is feasible; several new agent components and the GALA benchmark are introduced without external validation.

axioms (1)

domain assumption Separating exploration and exploitation into distinct critic and generator agents improves research report quality over conflated approaches.
Explicitly stated as the motivation for the architecture in the abstract.

invented entities (3)

Critic agent no independent evidence
purpose: Evaluates outlines and refines search queries
New role introduced to handle evaluation and query refinement.
Generator agent no independent evidence
purpose: Retrieves results and revises outlines
New role introduced to handle information retrieval and outline updates.
GALA benchmark no independent evidence
purpose: Mines latent research interests from historical browsing behavior
New evaluation dataset introduced to address limitations of existing benchmarks.

pith-pipeline@v0.9.0 · 5592 in / 1515 out tokens · 110369 ms · 2026-05-13T05:33:14.507012+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

formulates deep research as an adversarial optimization problem between information exploration and exploitation... critic agent... generator agent... meta-optimization harness... policy bank
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

iteratively refined outline... downstream report writer

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

155 extracted references · 155 canonical work pages

[1]

LangChain, Inc

URLhttps://arxiv.org/abs/2507.16075. LangChain, Inc. LangChain: Building applications with LLMs through composability, 2023. URL https://python.langchain.com/. Yoonho Lee, Roshen Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, and Chelsea Finn. Meta- harness: End-to-end optimization of model harnesses.arXiv preprint arXiv:2603.28052, 2026. Yu Lei, Shuzhe...

work page arXiv 2023
[2]

• I’m planning to

Background Context (1–2 sentences) Describe a real user scenario, clearly stating the user’s goal or confusion, such as: 19 • I’m planning to purchase . . . • I’m planning to . . . • I’m considering whether to . . . • I’m torn between the following options

work page
[3]

Research Subjects Clearly list the specific subjects to be researched, using numbered format: (1) . . . (2) . . . (3) . . . Research subjects can be: products / brands / solutions / locations / platforms / strategies

work page
[4]

Research Questions (3–5) Use numbered format to list questions that require research to answer, such as:

work page
[5]

What are real user reviews and reputation like

work page
[6]

Does actual experience match the advertised claims

work page
[7]

How is the price and value for money

work page
[8]

What are common issues or usage risks

work page
[9]

How high is the long-term usage cost or maintenance difficulty Research questions must be: specific, searchable, comparable, and analyzable

work page
[10]

Constraints (2–4) Clearly state the user’s limiting conditions, such as: • Budget constraint: total budget not exceeding XXX • Usage scenario: mainly for commuting / suitable for southern climate / no drilling allowed in rental • Personal preference: prioritizes durability / dislikes complex maintenance • Time condition: needs to make a decision soon

work page
[11]

Which is more suitable for me, XX or XX? Compare from XX dimen- sions

Research Goal Clearly state the action conclusion the user hopes to obtain, such as: • Provide a ranked recommendation • Determine whether it is worth purchasing • Choose the optimal solution • Formulate a specific action plan ## Step 3 — Query Depth Requirements Each query must satisfy: • Requires at least 3 different information sources to answer • Requ...

work page
[12]

Output only the category name — do not add any explanation, punctuation, or extra text

work page
[13]

Must select from the categories listed above — do not create new categories

work page
[14]

The detailed prompt is listed as follows

Choose the category that most closely matches the core intent of the query A.3 Prompt for Planner Agent As described in Section 2, the primary objective of our planner agent is to interpret user intent and generate corresponding guidance cues and response style specifications to direct the subsequent agents accordingly. The detailed prompt is listed as fo...

work page
[15]

X vs Y”, “difference between X and Y

Comparison & Selection • Core signals: “X vs Y”, “difference between X and Y”, “X or Y”, “difference between”, “com- pared to” — two or more entities explicitly placed side by side • Response content: The opening section must include a one-sentence conclusion; then dynami- cally select the most relevant dimensions based on the topic (e.g., performance / p...

work page
[16]

recommend

Recommendations & Suggestions • Core signals: “recommend”, “best”, “what’s a good”, “best X”, “top X for Y” — seeking advice with no specific candidates in mind • Response content: The opening section must provide a recommendation overview (e.g., top pick, runner-up, best value); then offer a comparison of the recommended options, which may include core h...

work page
[17]

how to do

How-to Guide • Core signals: “how to do”, “how to”, “tutorial”, “getting started”, “step by step” — action- oriented, expecting operational steps • Response content: The opening section must cover prerequisites (environment/requirements), core steps, and estimated time; subsequent sections must detail each step’s description and common issues

work page
[18]

travel to X

Travel Planning • Core signals: “travel to X”, “X travel guide”, “X days”, “travel guide”, “itinerary” — location + travel-related terms • Response content: The opening section must provide 2–3 sentences of overview; then provide a Day 1–Day X itinerary including must-visit attractions, dining recommendations, transporta- tion guide, accommodation suggest...

work page
[19]

how much does X cost

Purchase Decision • Core signals: “how much does X cost”, “is X worth buying”, “which model is better”, “worth it”, “should I buy” — price / purchase intent • Response content: The opening section must provide 2–3 sentences covering: whether it is recommended + who it suits + the single most important reason; then present a product overview table with rea...

work page
[20]

what is”, “what does X mean

Fact Query • Core signals: “what is”, “what does X mean”, “Define” — expecting a definitive answer • Response content: The opening section must answer with a concise, accurate core definition; include 2–5 of the most important key points based on the topic

work page
[21]

latest developments

Status & Progress • Core signals: “latest developments”, “what’s happening with X now”, “X update”, “X latest” — contains time-indicative words • Response content: Pay attention to information recency; the opening section must present a recent update timeline with concise timestamps, events, and brief descriptions

work page
[22]

X news”, “X this week

News & Information • Core signals: “X news”, “X this week” — explicitly news / current-events oriented • Response content: The opening section must list the most important news headlines with 2–3 objective summary sentences; subsequent content follows reverse chronological order with clear, verifiable timestamps

work page
[23]

deep dive into

Deep Exploration • Core signals: “deep dive into”, “X ecosystem”, “ecosystem”, “everything about” — open- ended, no clearly defined scope • Response content: The opening section must provide 2–3 sentences of general framing: what the topic is, why it is worth exploring, and its current significance

work page
[24]

X official website

Resource Locating 23 • Core signals: “X official website”, “X documentation”, “X GitHub”, “official site”, “documen- tation” — looking for specific links or resources • Response content: The opening section must list the core links; subsequent sections may provide additional extended resources # Output Format Output JSON directly with no extra content, us...

work page
[25]

Zero-score situations:Empty outlines, malicious content (empty answers, meaningless text, score manipulation, etc.) receive 0 directly

work page
[26]

Strict standards:Each dimension is scored independently (0–10), and the total score is the average of all dimensions

work page
[27]

High-score threshold:A score of 8 or above is only awarded to outlines that perform exceptionally in that dimension; 7 is good, 6 is acceptable, and below 5 is unacceptable Evaluation Dimensions (0–10 points each) 1.Instruction Adherence (0–10) • 9–10: Perfectly follows all user requirements (topic, audience, purpose, format, length, etc.), with clear hie...

work page
[28]

Non-empty list:Supplement and optimize based on the current user query, with a focus on reinforcing missing dimensions

work page
[29]

Empty list:Generate a comprehensive list of key point content based on the user query, producing a key points list that covers the core elements

work page
[30]

Ensure content breadth (covering multiple relevant dimensions) and depth (specific analysis points for each dimension)

Update principles:Prioritize addition and rewriting logic; avoid simply deleting existing reasonable content. Ensure content breadth (covering multiple relevant dimensions) and depth (specific analysis points for each dimension). Address weaknesses identified in the evaluation dimensions in a targeted manner

work page
[31]

List length:Flexibly determined based on the complexity and coverage scope of the user query; simple questions may be appropriately condensed, complex questions should be fully expanded; generally recommended to stay within{max blueprints len} { Search Term Generation Guidelines (Xiaohongshu / Knowledge):

work page
[32]

Extract all core topic words, proper nouns, and important attributes from the user’s question

work page
[33]

Each search term must be specific, clear, and closely aligned with the user’s needs, suitable for use on the Xiaohongshu platform; it is strictly prohibited to introduce irrelevant, vague, or redundant information

work page
[34]

If the user’s question involves details such as time, location, person, or scenario, extract and incorporate them reasonably into the keywords; if not explicitly mentioned, there is no need to force their inclusion

work page
[35]

Ensure diversity of keyword expression, covering different synonymous expressions or important subcategories under the same topic

work page
[36]

Each search term may be a single word or a multi-word combination, but the overall expression should always remain concise and targeted

Keywords within each search term group should be separated by spaces (example: skincare hydrating mask); different search term groups should be separated by English commas. Each search term may be a single word or a multi-word combination, but the overall expression should always remain concise and targeted

work page
[37]

Assess whether the user’s original input already contains expressions suitable for use as search terms; if so, retain and include them directly in the result list

work page
[38]

Do not output any explanations, descriptions, or formatting symbols; output only the final list of search term groups

work page
[39]

Generated search terms should be in Chinese

work page
[43]

{ { Search Term Generation Guidelines (Google):

Note: Prioritize the user’s requirements when deciding the number of search terms to generate for each outline target list item; in general, it is recommended to keep the number of search terms within{max query len}. { { Search Term Generation Guidelines (Google):

work page
[44]

Precisely extract core topic words, proper nouns, and important information from the user’s input

work page
[45]

Q3 2024” should be written as “Third Quarter of 2024

Time information must be identified and completed: extract explicit time references directly (e.g., “Q3 2024” should be written as “Third Quarter of 2024”); implicit time references must be converted into specific intervals (e.g., “last quarter” requires automatic calculation of the previous quarter’s start and end dates based on today’s date:{{curr date}}) 26

work page 2024
[46]

Keyword priority order: proper nouns (brands, companies, products, policies, etc.) > metrics or characteristics (figures, sales volumes, new products, technological breakthroughs, etc.) > key actions (releases, rises/falls, mergers, experiences, etc.) > regions or scenarios (cities, countries, specific locations)

work page
[47]

how”, “whether

Expressions must be concise: remove interrogative words (“how”, “whether”, etc.), subjective descriptors (“amazing”, “ultra-powerful”, etc.), and vague expressions (“some”, “various”, etc.); retain only content with actual retrieval significance

work page
[48]

vs” or “comparison

For special scenarios, such as comparative questions, retain both sides of the comparison and highlight them with “vs” or “comparison”

work page
[49]

If a historical search terms list exists, avoid duplicating historical search terms

Note: Search terms must ensure broad and diverse coverage; they do not need to be strongly related to the user’s question, as long as they provide incremental value. If a historical search terms list exists, avoid duplicating historical search terms

work page
[50]

Note: Search term generation should aim for depth and should not be empty where possible

work page
[51]

Note: When the input outline content is non-empty, search term generation should explore the content depth lacking in each sub-heading of the outline as much as possible, striving to enrich the depth of outline content

work page
[52]

rating":`float`- Score for the given outline,

Note: Prioritize the user’s requirements when deciding the number of search terms to generate for each outline target list item; in general, it is recommended to keep the number of search terms within{max query len}. { ## Output Format Please strictly output in the following JSON format: { "rating":`float`- Score for the given outline, "justification":`st...

work page
[53]

how to completely answer the user’s query,

Query-first response:The overall organizational logic of the outline must center on directly responding to the user’s query. All chapter divisions and sub-topic settings must revolve around “how to completely answer the user’s query,” avoiding generalized expansions that deviate from the user’s core needs

work page
[54]

Prioritize responding to the user’s primary intent and cover the user’s secondary intent in specific sections

Response style:Follow the style to cover and arrange the key points of the outline content. Prioritize responding to the user’s primary intent and cover the user’s secondary intent in specific sections

work page
[55]

Ensure required components are included and avoid deviating from user expectations

Instruction adherence:Generate the outline strictly according to the requirements of the user’s query, including subject scope, audience positioning, level of detail, tone and style, as well as any formatting or structural requirements. Ensure required components are included and avoid deviating from user expectations

work page
[56]

Content depth:Based on the outline key points list, ensure the outline possesses analytical depth. An excellent outline not only contains generalizing headings but should also include: specific analysis points, key argumentation logic, mechanisms and causal relationships, methodological frameworks, evaluation metrics, dependency analysis, and evidence and...

work page
[57]

Perspective balance:Ensure fairness and objectivity of the outline. For complex or controversial issues, multiple perspectives and differing viewpoints should be planned, content space should be allocated fairly, and neutral, non-leading language should be used. Explicitly include sections for trade-off analysis, discussion of limitations, and considerati...

work page
[58]

Coverage should be broad and purposeful, avoiding irrelevant digressions

Coverage breadth:Based on the outline key points list, ensure coverage of multiple relevant dimensions, such as: historical background, policies and regulations, market economics, tech- nical operations, social culture, geographic comparisons, stakeholder analysis, risk assessment, and implementation pathways. Coverage should be broad and purposeful, avoi...

work page
[59]

Precisely add citation markers <cite>document ID</cite> after relevant content, ensuring citation diversity to enhance the credibility and comprehensiveness of the argumentation

Evidence support:Systematically plan the evidence framework and sources. Precisely add citation markers <cite>document ID</cite> after relevant content, ensuring citation diversity to enhance the credibility and comprehensiveness of the argumentation. Fabricating citation information is strictly prohibited

work page
[60]

Insight value:Go beyond common templates by providing original structural frameworks, high- lighting non-obvious connections, and rationally sequencing sections to efficiently reveal key insights. Ensure recommendations and analyses are specific and actionable, explicitly identifying specific cases, comparative studies, and appropriate presentation method...

work page
[61]

When a section at a given level requires subdivision, it should contain 2 or more sub-headings to ensure reasonable and complete categorization

Structural logic:Build clear hierarchical relationships with distinct responsibilities for headings at each level and smooth logical flow. When a section at a given level requires subdivision, it should contain 2 or more sub-headings to ensure reasonable and complete categorization. Focus on overall structural coherence, logical relationships between sect...

work page
[62]

### Special Requirements

Citation diversity:Cite as many different document IDs as possible to enhance evidence support through diversified sources and provide multi-perspective viewpoints. ### Special Requirements

work page
[63]

Executive Summary,

Open with a direct substantive answer (preamble and background explanations are absolutely prohibited): • Thefirst chapter of the report(i.e., the ## heading) must cut straight to the point and provide thefinal substantive answerto the user’s query. • It isstrictly prohibitedto write vacuous preamble content such as “Executive Summary,” “Background Introd...

work page
[64]

how does this section serve the answering of the user’s query

Content self-consistency:Ensure the outline covers the complete scope of the topic, with each section corresponding to and echoing the others to form a complete closed loop. Content should have no repetition, no omissions, no conflicts, and must be practical and readable. All sections must be able to clearly answer the question “how does this section serv...

work page
[65]

Deep exploration:On the premise of ensuring logic and consistency, generate more levels of sub-headings to ensure each section is explored in depth, avoiding superficial generalizations

work page
[66]

Iterative optimization:If the previous round outline content is non-empty, conduct systematic iteration based on the previous round outline, fully incorporating the improvement suggestions from the evaluation

work page
[67]

On the premise of ensuring logical relationships between sections, divide into as many core sections as possible to cover the content of the outline key points list

Section richness:To improve overall information coverage, multiple core sections ( ##) should be used in the outline. On the premise of ensuring logical relationships between sections, divide into as many core sections as possible to cover the content of the outline key points list. In general, the number of core sections should be no fewer than 7–10 (inc...

work page
[68]

Any content relevant to the user’s query and outline key points list should be cited wherever possible

High citation coverage:To improve overall information coverage, externally sourced search results should be utilized as fully as possible. Any content relevant to the user’s query and outline key points list should be cited wherever possible. In general, the number of externally sourced search content citations should be no fewer than 100–200. ## Citation...

work page
[69]

Format standard:Use the <cite>document ID</cite> format, e.g., <cite>turn_0_4, turn_1_8</cite>

work page
[70]

Positional accuracy:Immediately follow the relevant information, ensuring citations correspond precisely to content

work page
[71]

## Output Format

Prohibition principle:Fabricating cited document information or fictitious document IDs is strictly prohibited. ## Output Format

work page
[72]

Please output only the final answer outline; do not repeat the user’s question and do not output any opening remarks or explanatory statements

work page
[73]

Strictly follow the above rules and structure, ensuring clarity of organization, richness of content, and elegance of expression

work page
[74]

## Chapter 2 [Specific Discussion / Dimensional Breakdown...]

Strictly output using the following Markdown hierarchical structure: # [Overall Report Title] ## Chapter 1 [Core Conclusion That Directly Answers the Query] ### 1.1 [Declarative sentence heading for Conclusion Dimension 1] ### 1.2 [Declarative sentence heading for Conclusion Dimension 2] ... ## Chapter 2 [Specific Discussion / Dimensional Breakdown...] .....

work page
[75]

Why. . . ?

Strictly follow the outline framework:Use the heading structure of the input outline as the sole skeleton, filling in corresponding content under each heading. It is strictly prohibited to independently add, delete, merge, or split any heading level. The organizational logic of section content must fully correspond to the outline’s hierarchical structure;...

work page
[76]

Write around the user’s query:While filling in the outline framework, every paragraph of content must clearly serve the answering of the user’s query. Before writing, first clarify the role this section plays in answering the user’s query (background setting, core argumentation, data support, conclusions and recommendations, etc.), and use this as the gui...

work page
[77]

Content supplementation:Remain faithful to the outline framework; supplement the details of the outline by combining search document content, focusing exclusively on the sections designated in the outline; it is strictly prohibited to supplement the content of preceding or following sections

work page
[78]

Logical optimization:Ensure the report structure is clear, well-layered, thoroughly argued, and professionally expressed

work page
[79]

Let’s think step by step

Citation standards:Strictly maintain the <cite>document ID</cite> format; it is prohibited to fabricate document content or fictitious document IDs. 6.Quality assurance:Apply the “Let’s think step by step” approach; content must be well-reasoned and evidence-based, avoiding vague statements and ensuring information accuracy

work page
[80]

Formatting aesthetics:Based on the question type of the user’s query, adopt an appropriate and readable format (such as paragraphs, numbered lists, tables, etc.) to enhance readability

work page
[81]

Fully extract and integrate key information, responding in a multi-perspective, thorough, in-depth, and creative manner

Information integration:Synthesize the content of multiple relevant search results; the same externally sourced search document content must not be cited repeatedly. Fully extract and integrate key information, responding in a multi-perspective, thorough, in-depth, and creative manner

work page
[82]

Language consistency:Unless the user specifically requests otherwise, respond in the same lan- guage as the user’s query

work page
[83]

Consistency and self-coherence:Ensure that every key point is answered in a self-consistent, substantive, and professional manner; for example, a weekly meal plan must list a complete seven-day menu

work page

Showing first 80 references.