{"total":14,"items":[{"citing_arxiv_id":"2606.11926","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Generalist Autonomous Research via Hypothesis-Tree Refinement","primary_cat":"cs.CL","submitted_at":"2026-06-10T10:57:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29555","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Blind Guess to Informed Judgment: Teaching LLMs to Evaluate Materials by Building Knowledge-Augmented Preference Signals","primary_cat":"cs.CL","submitted_at":"2026-05-28T08:09:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MaterEval generates paired informed and blind evaluations as preference signals to improve small open-source LLMs on high-entropy alloy assessment, approaching closed-source performance without external retrieval.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16986","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents","primary_cat":"cs.CL","submitted_at":"2026-05-16T13:14:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13534","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Scaling Retrieval-Augmented Reasoning with Parallel Search and Explicit Merging","primary_cat":"cs.AI","submitted_at":"2026-05-13T13:46:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MultiSearch uses parallel multi-query retrieval plus explicit merging inside a reinforcement-learning loop to improve retrieval-augmented reasoning, outperforming baselines on seven QA benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10488","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning","primary_cat":"cs.CL","submitted_at":"2026-05-11T12:48:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DeepRefine refines agent-compiled knowledge bases via multi-turn abductive diagnosis and RL training with a GBD reward, yielding consistent downstream task gains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09915","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents","primary_cat":"cs.CL","submitted_at":"2026-05-11T03:07:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27859","ref_index":103,"ref_count":3,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rethinking Agentic Reinforcement Learning In Large Language Models","primary_cat":"cs.AI","submitted_at":"2026-04-30T13:43:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper reviews conceptual foundations, methodological innovations, effective designs, critical challenges, and future directions for LLM-based Agentic Reinforcement Learning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"training environments, rigorous benchmarking, and real-device deployment, the ClawGUI [83] framework is introduced. Rather than treating these as disparate stages, ClawGUI provides an integrated lifecycle solution, establishing a cohesive pipeline that spans from reinforcement learning optimization to standardized evaluation and final production release. Researchers introduce MetaClaw [103], a continual meta-learning framework that unifies a base LLM policy with an evolving skill library through two mutually reinforcing mechanisms. Table 1. Comparison of previous works and this work. Ref LLMs Agents RL Designs [5, 18, 23, 30, 33, 45, 47, 62, 77, 81, 84, 85, 95, 101, 122]✓ [42, 49, 68, 100, 105]✓ [2, 14, 22, 34, 37, 56, 59, 69, 98, 120]✓ ✓"},{"citing_arxiv_id":"2605.00043","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms","primary_cat":"cs.DB","submitted_at":"2026-04-29T06:18:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SiriusHelper deploys an LLM agent with intent routing, DeepSearch multi-hop retrieval, and automated SOP distillation to outperform alternatives and reduce ticket volume by 20.8% on Tencent's big data platform.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12890","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Long-horizon Agentic Multimodal Search","primary_cat":"cs.CV","submitted_at":"2026-04-14T15:40:28+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LMM-Searcher uses file-based visual UIDs and a fetch tool plus 12K synthesized trajectories to fine-tune a multimodal agent that scales to 100-turn horizons and reaches SOTA among open-source models on MM-BrowseComp and MMSearch-Plus.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"generation for knowledge-intensive nlp tasks.Advances in neural information processing systems, 33:9459-9474, 2020. [30] Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yixin Dai, Jiawei Sun, Haofen Wang, Haofen Wang, et al. Retrieval-augmented generation for large language models: A survey.arXiv preprint arXiv:2312.10997, 2(1):32, 2023. [31] Yunjia Xi, Jianghao Lin, Yongzhao Xiao, Zheli Zhou, Rong Shan, Te Gao, Jiachen Zhu, Weiwen Liu, Yong Yu, and Weinan Zhang. A survey of llm-based deep search agents: Paradigm, optimization, evaluation, and challenges.arXiv preprint arXiv:2508.05668, 2025. [32] Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. Retrieval augmented language model pre-training."},{"citing_arxiv_id":"2604.08224","ref_index":162,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering","primary_cat":"cs.SE","submitted_at":"2026-04-09T13:19:41+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05952","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Towards Trustworthy Report Generation: A Deep Research Agent with Progressive Confidence Estimation and Calibration","primary_cat":"cs.AI","submitted_at":"2026-04-07T14:46:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A deep research agent incorporates progressive confidence estimation and calibration to produce trustworthy reports with transparent confidence scores on claims.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04949","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Retrieve from Agent Trajectories","primary_cat":"cs.IR","submitted_at":"2026-03-30T17:59:02+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Retrievers trained on agent trajectories via the LRAT framework improve evidence recall, task success, and efficiency in agentic search benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.14703","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ToolPRM: Fine-Grained Inference Scaling of Structured Outputs for Function Calling","primary_cat":"cs.AI","submitted_at":"2025-10-16T14:06:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ToolPRM provides fine-grained intra-call process supervision via a new dataset and reward model, outperforming outcome and coarse-grained alternatives on function-calling benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.00861","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs","primary_cat":"cs.CL","submitted_at":"2025-10-01T13:10:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ERL trains LLMs to erase faulty reasoning steps and regenerate them in place, yielding gains of up to 8.48% EM on multi-hop QA benchmarks like HotpotQA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}