{"total":23,"items":[{"citing_arxiv_id":"2606.31504","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SimpleSearch-VL: A Simple Recipe for Multimodal Agentic Deep Search","primary_cat":"cs.CV","submitted_at":"2026-06-30T11:22:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SimpleSearch-VL improves Qwen3-VL multimodal agent baselines by 15.8-16 points on average using 7K total training examples and reaches parity with Gemini-3-Pro on the 30B variant.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.27330","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Empowering GUI Agents via Autonomous Experience Exploration and Hindsight Experience Utilization for Task Planning","primary_cat":"cs.CL","submitted_at":"2026-06-25T17:44:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEEU enables a 7B MLLM to reach 30.6% accuracy on GUI task planning by autonomous exploration and hindsight experience synthesis, outperforming a 32B model through stronger high-level OOD generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24233","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Latent Visual States for Efficient Multimodal Reasoning","primary_cat":"cs.CV","submitted_at":"2026-06-23T07:22:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"EVA generates adaptive Latent_slot tokens as internal visual thoughts, trained end-to-end with text tokens via D-GSPO on the EVA-230K dataset, claiming performance gains and better inference efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20122","ref_index":28,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ScaffoldAgent: Utility-Guided Dynamic Outline Optimization for Open-Ended Deep Research","primary_cat":"cs.AI","submitted_at":"2026-06-18T11:47:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ScaffoldAgent improves long-form report generation by modeling outline evolution as expansion, contraction, and revision guided by a utility function estimating downstream value.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12191","ref_index":269,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application","primary_cat":"cs.CL","submitted_at":"2026-06-10T15:15:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"This survey categorizes agentic environments for LLMs by eight attributes and domains, introduces symbolic and neural synthesis paradigms with evaluation, and outlines four agent evolution pathways plus three environment evolution paradigms.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":",BAGEL [247], OS-Genesis [248], Insta [249], APIGen-MT [250], WebShaper [251],WebWatcher [252], WebExplorer [253], AutoPlay [254], CRMWeaver [255], WebLeaper [256],etc. Trajectory Synthesis (§6.3.2)e.g.,ToolAlpaca [257], Lingma SWE-GPT [258], Aguvis [259], FlowReasoner [260],WebSynthesis [261], AgentFold [262], ToolACE-MCP [263], ProAct [264],etc. Trajectory Refinement (§6.3.3)e.g.,Toolformer [265], ETO [266], Self-Improvement [267], GUI-Reflection [268], TiG [269],AgentFrontier [270], WebSTAR [271], SynthAgent [272], TopoCurate [273],etc. Exploration-CentricOnline Evolution(§6.4) Reasoning Structure (§6.4.1)e.g.,DeepRetrieval [274], Search-R1 [275], AutoRefine [276], SEEA-R1 [277], M3-Agent [278],Video-Thinker [279], ReSearch [280],etc. Reward Shaping (§6.4.2)e.g.,Agent-R1 [281], ToolRL [8], Chain-of-Agents [245], VRAG-RL [282], GDPO [283],FlowSteer [284], Tool-N1 [285], ToolOrchestra [286],etc."},{"citing_arxiv_id":"2606.12087","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents","primary_cat":"cs.CL","submitted_at":"2026-06-10T13:49:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"FORT synthesizes shortcut-resistant search tasks by controlling four identified shortcut risks across entity selection, graph construction, question formulation, and refinement, producing training data that yields agents with longer search trajectories and top performance among open-source models on","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11926","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Generalist Autonomous Research via Hypothesis-Tree Refinement","primary_cat":"cs.CL","submitted_at":"2026-06-10T10:57:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09138","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-06-08T07:35:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Claw-R1 provides a Gateway Server and Data Pool to manage step-level agent interaction traces as structured data assets for agentic RL training.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07689","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Struct-Searcher: Agentic Structural Thinking Advances Multimodal Deep Information Seeking","primary_cat":"cs.CV","submitted_at":"2026-06-05T06:25:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Struct-Searcher introduces a structural agentic workflow grounded in belief revision theory that maintains an evolving multimodal graph for conflict-aware deep information seeking and reports accuracy gains on several VL benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04703","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Continual Experience Internalization for Self-Evolving LLM Agents","primary_cat":"cs.CL","submitted_at":"2026-06-03T10:30:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Existing methods for turning LLM interaction experience into parametric skills collapse over multiple iterations; principle-level experience, step-wise injection, and off-policy teacher distillation yield more stable continual learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22138","ref_index":95,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Efficient Agentic Reasoning Through Self-Regulated Simulative Planning","primary_cat":"cs.AI","submitted_at":"2026-05-21T08:11:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SR²AM achieves competitive Pass@1 accuracy on diverse tasks with 25.8-95.3% fewer reasoning tokens than much larger models by using self-regulated simulative planning trained via supervised learning and RL.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Betweenunderthinkingandoverthinking: Anempiricalstudyofreasoninglengthandcorrectnessinllms.arXivpreprintarXiv:2505.00127,2025. [93] RichardSSutton. Dyna,anintegratedarchitectureforlearning,planning,andreacting.ACMSigart Bulletin,2(4):160-163,1991. [94] RichardSSutton,AndrewGBarto,etal.Reinforcementlearning: Anintroduction,volume1. MITpress Cambridge,1998. [95] ZhengweiTao,JialongWu,WenbiaoYin,JunkaiZhang,BaixuanLi,HaiyangShen,KuanLi,Liwen Zhang,XinyuWang,YongJiang,PengjunXie,FeiHuang,andJingrenZhou. Webshaper: Agentically datasynthesizingviainformation-seekingformalization.arXivpreprintarXiv:2507.15061,2025. [96] K2 Think Team, Taylor W. Killian, Varad Pimpalkhute, Richard Fan, Haonan Li, Chengqian Gao,"},{"citing_arxiv_id":"2605.20876","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Terminal-World: Scaling Terminal-Agent Environments via Agent Skills","primary_cat":"cs.CL","submitted_at":"2026-05-20T08:14:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Terminal-World is a skill-based synthesis pipeline that generates 5,723 training environments and produces Terminal-World-32B which outperforms baselines on Terminal-Bench 2.0 using only 1.2% of the data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13034","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ViDR: Grounding Multimodal Deep Research Reports in Source Visual Evidence","primary_cat":"cs.CV","submitted_at":"2026-05-13T05:39:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ViDR treats source figures as retrievable and verifiable evidence objects in multimodal deep research reports and introduces MMR Bench+ to measure improvements in visual integration and verifiability.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"LLMs to reason over search results through reinforcement learning; WebDancer [ 25] formulates information seeking as an autonomous agency problem; WebWeaver [ 15] structures web scale evidence through dynamic outlines; DualGraph [21] separates knowledge exploration from outline structure via jointly evolving knowledge and outline graphs; and WebShaper [22] studies agentic synthesis of training data through formalized information seeking. Recent surveys [31, 9] provide systematic views of this rapidly growing landscape. Despite this progress, most deep research agents remain text-centered at the report level. Recent multimodal agents such as WebWatcher [5] and Vision-DeepResearch [8] extend search to multimodal"},{"citing_arxiv_id":"2604.14518","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mind DeepResearch Technical Report","primary_cat":"cs.AI","submitted_at":"2026-04-16T01:20:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MindDR combines a Planning Agent, DeepSearch Agent, and Report Agent with SFT cold-start, Search-RL, Report-RL, and preference alignment to reach competitive scores on research benchmarks using 30B-scale models.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"pairs for the same input, where y+ ∈ D + and y− ∈ D −. DPO directly optimizes the log-probability 16 Table 2: Performance on five DS benchmarks. Best results in our evaluation environment are shown in bold, and second-best results are underlined. Model Browse Comp-ZH Browse Comp xbench -DS GAIA -DS Wide Search Large-Scale Foundation Models GLM-4.6 [50] 45.1 49.573.0 52.6 43.1 Kimi K2 [31] 28.8 14.1 50.0 57.7 54.4 DeepSeek R1 [5] 34.6 14.1 50.0 57.7 44.3 Qwen3-235B [44] 31.1 21.7 57.0 63.1 46.4 Comparable-Scale Agent Models WebDancer-32B [40] 25.3 10.5 11.0 63.1 39.7 WebSailor-32B [16] 25.6 14.8 46.0 50.5 40.3 WebShaper-32B [29] 28.0 33.5 53.0 54.4 35.2 MiroThinker-v1.5-30B-A3B [33] 31.9 30.4 5.0 23.3 37.9 OpenSeeker-30B-A3B[8] 26.4 12."},{"citing_arxiv_id":"2604.06777","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Walk the Talk: Bridging the Reasoning-Action Gap for Thinking with Images via Multimodal Agentic Policy Optimization","primary_cat":"cs.CV","submitted_at":"2026-04-08T07:48:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MAPO improves multimodal chain-of-thought reasoning by requiring explicit textual descriptions of visual tool results and using a novel advantage estimator that combines semantic alignment with task rewards.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Yida Zhao, Kuan Li, et al. Webwatcher: Breaking new frontier of vision-language deep research agent.arXiv preprint arXiv:2508.05748, 2025. [49] Kuan Li, Zhongwang Zhang, Huifeng Yin, Liwen Zhang, Litu Ou, Jialong Wu, Wenbiao Yin, Baixuan Li, Zhengwei Tao, Xinyu Wang, et al. Websailor: Navigating super-human reasoning for web agent.arXiv preprint arXiv:2507.02592, 2025. [50] Zhengwei Tao, Jialong Wu, Wenbiao Yin, Junkai Zhang, Baixuan Li, Haiyang Shen, Kuan Li, Liwen Zhang, Xinyu Wang, Yong Jiang, et al. Webshaper: Agentically data synthesizing via information-seeking formalization. arXiv preprint arXiv:2507.15061, 2025. [51] Xinji Mai, Haotian Xu, Zhong-Zhi Li, Weinong Wang, Jian Hu, Yingying Zhang, Wenqiang Zhang, et al."},{"citing_arxiv_id":"2604.03679","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LightThinker++: From Reasoning Compression to Memory Management","primary_cat":"cs.CL","submitted_at":"2026-04-04T10:46:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LightThinker++ adds explicit adaptive memory management and a trajectory synthesis pipeline to LLM reasoning, cutting peak token use by ~70% while gaining accuracy in standard and long-horizon agent tasks.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"in base trajectories compared to the Vanilla, these trajectories were decomposed into 42,633 fine-grained training instances. This yields a more potent and logically dense learning signal, providing the model with the necessary supervision to maintain context hygiene and reasoning fidelity in context-heavy tasks. Baselines and Training.We evaluate our framework against several state-of-the-art LLMs, including GLM-4.6 [45], Claude-4-Sonnet [46], GPT-5 [47], Kimi-K2 [48] and Qwen3-235B-A22B-Instruct [49] and the DeepSeek-V3 series (V3.1 and V3.2). To assess the specific impact of explicit memory management, we develop and evaluate two internal variants initialized from Qwen3-30B-A3B-Thinking-2507 [49]. The first, Vanilla-Agent, is fine-tuned on theVanilla Baselinedataset to equip the model with environment-level"},{"citing_arxiv_id":"2603.04751","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating the Search Agent in a Parallel World","primary_cat":"cs.AI","submitted_at":"2026-03-05T02:56:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Mind-ParaWorld creates parallel worlds with atomic facts to evaluate search agents on future scenarios, showing they synthesize evidence well but struggle with collection, coverage, sufficiency judgment, and stopping decisions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.15808","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification","primary_cat":"cs.AI","submitted_at":"2026-01-22T09:47:31+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DeepVerifier enables self-evolving deep research agents via rubric-guided verification at test time, delivering 8-11% accuracy gains on GAIA and XBench-DeepSearch subsets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.11793","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling","primary_cat":"cs.CL","submitted_at":"2025-11-14T18:52:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MiroThinker shows that scaling agent-environment interactions via reinforcement learning lets a 72B open-source model reach up to 81.9% on GAIA and approach commercial performance on research benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.07969","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search","primary_cat":"cs.CV","submitted_at":"2025-09-09T17:54:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Mini-o3 scales visual search reasoning to tens of interaction turns via a new probe dataset, iterative trajectory collection, and over-turn masking in RL, claiming SOTA performance while training only up to six turns.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.02547","ref_index":284,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The Landscape of Agentic Reinforcement Learning for LLMs: A Survey","primary_cat":"cs.AI","submitted_at":"2025-09-02T17:46:26+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Survey that defines agentic RL for LLMs via POMDPs, introduces a taxonomy of planning/tool-use/memory/reasoning capabilities and domains, and compiles open environments from over 500 papers.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"by assigning intermediate step-level rewards. Atom-Searcher [272] is an agentic deep research framework 26 that significantly improves LLM problem-solving by refining the reasoning process itself, not just the final outcome. WebDancer [106] leverages human browsing trajectory supervision plus RL fine-tuning to produce autonomous ReAct-style agents, excelling on GAIA [283] and WebWalkerQA [284]. WebThinker [269] embeds a Deep Web Explorer into a think-search-draft loop, aligning via DPO with human feedback to tackle complex report-generation. WebSailor [105] is a complete post-training methodology designed to teach LLM agents sophisticated reasoning for complex web navigation and information-seeking tasks. WebWatcher [270] further extends to multimodal search, combining visual-language reasoning, tool use, and RL to outperform"},{"citing_arxiv_id":"2508.05748","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent","primary_cat":"cs.IR","submitted_at":"2025-08-07T18:03:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"WebWatcher introduces a vision-language deep research agent trained on synthetic multimodal trajectories and RL that outperforms baselines on VQA benchmarks, along with a new BrowseComp-VL evaluation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.00414","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training","primary_cat":"cs.AI","submitted_at":"2025-08-01T08:11:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Cognitive Kernel-Pro provides an open-source agent framework with curated training data across web, file, code, and reasoning domains plus test-time reflection and voting, achieving SOTA results on GAIA among free agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}