{"total":30,"items":[{"citing_arxiv_id":"2606.11926","ref_index":149,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Toward Generalist Autonomous Research via Hypothesis-Tree Refinement","primary_cat":"cs.CL","submitted_at":"2026-06-10T10:57:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22343","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators","primary_cat":"cs.MA","submitted_at":"2026-05-21T11:29:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Sibyl-AutoResearch introduces self-evolving trial-and-error harnesses with auditable conversion units that link trial signals to updated research behaviors and harness repairs in autonomous systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21481","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists","primary_cat":"cs.AI","submitted_at":"2026-05-20T17:59:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AiraXiv is a proposed AI-driven platform for open preprints that supports human and AI authors with interactive UI and MCP-based interactions, validated by serving as the submission system for ICAIS 2025.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22878","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research","primary_cat":"cs.AI","submitted_at":"2026-05-20T16:03:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20025","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration","primary_cat":"cs.AI","submitted_at":"2026-05-19T15:49:51+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17790","ref_index":48,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"STRIDE: A Self-Reflective Agent Framework for Reliable Automatic Equation Discovery","primary_cat":"cs.AI","submitted_at":"2026-05-18T03:14:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"STRIDE is a self-reflective agent framework that improves accuracy, OOD robustness, and structural recovery in LLM-based symbolic regression by integrating generation, evaluation, repair, and diversity-preserving memory.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10530","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery","primary_cat":"cs.IR","submitted_at":"2026-05-11T13:14:54+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery SIGIR '26, July 20-24, 2026, Melbourne, VIC, Australia 6 Related work Deep Research Agents represent a significant evolution from tradi- tional RAG systems by embedding autonomous agents capable of complex reasoning, tool orchestration, and reflection [34, 48]. The feasibility of such multi-agent architectures for sophisticated infor- mation retrieval has been demonstrated by commercial systems like OpenAI [29], Google Gemini [10], and Perplexity [30]. Currently, these deep research pipelines follow two primary methodological approaches: the first emphasizes explicit multi-agent collabora- tion, where distinct agents handle planning, question formulation,"},{"citing_arxiv_id":"2605.09915","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents","primary_cat":"cs.CL","submitted_at":"2026-05-11T03:07:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Demystifying the underground ecosystem of account reg- istration bots. InProceedings of the 30th ACM Joint European Software Engineering Conference and Sympo- sium on the Foundations of Software Engineering, pp. 897-909, 2022. Ghafarollahi, A. and Buehler, M. J. Sciagents: automating scientific discovery through bioinspired multi-agent in- telligent graph reasoning.Advanced Materials, 37(22): 2413523, 2025. Gotoman, J. E. J., Luna, H. L. T., Sangria, J. C. S., San- tiago Jr, C. S., Barbuco, D. D., et al. Accuracy and reliability of ai-generated text detection tools: a literature review.American Journal of IR 4.0 and Beyond, 4(1): 1-9, 2025. Guo, Z., Chen, Z., Nie, X., Lin, J., Zhou, Y ., and Zhang, W. Skillprobe: Security auditing for emerging agent"},{"citing_arxiv_id":"2605.06607","ref_index":26,"ref_count":3,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents","primary_cat":"physics.flu-dyn","submitted_at":"2026-05-07T17:27:23+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures missed by solver checks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01489","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning","primary_cat":"cs.AI","submitted_at":"2026-05-02T15:26:45+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"challenging scientific reasoning dataset that stress-tests long-horizon, multi-evidence problem solving grounded in frontier knowledge and computation. Using SciResearcherQA, we perform agent post-training following an established recipe that begins with supervised fine-tuning using rejection sampling and followed by reinforcement learning via GRPO [31]. Based on Qwen3-8B [47], we trainSciResearcher-8B, which achieves strong perfor- mance across multiple frontier scientific reasoning benchmarks, including HLE-Bio/Chem-Gold [43], SuperGPQA-Hard-Biology [23], and TRQA-Literature [48]. On HLE-Bio/Chem-Gold, our agent attains 19.46% pass@1 and 31.54% pass@3, surpassing existing proprietary agents such as SciMaster"},{"citing_arxiv_id":"2605.04097","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CTM-AI: A Blueprint for General AI Inspired by a Model of Consciousness","primary_cat":"q-bio.NC","submitted_at":"2026-04-30T20:48:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CTM-AI combines a formal consciousness model with foundation models to report state-of-the-art results on sarcasm detection, humor, and agentic tool-use benchmarks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Link fusion. Once links are established, each processor LTMi consults its linked neighbors N(i) in parallel. It poses follow-up queries qi t+1 derived from its updated mem- ory (including the newly broadcast chunk); the neighbors respond via theirexecutefunction, and the initiating proces- sor integrates these responses viawrite: CTMfuse(o, L) = \b [LTM i t K i=1 = \b LTMi t+1(·) K i=1 , [LTM i t = LTMi t \u0010\b LTMj t (o, q j) j∈N(i) \u0011 (9) This unconscious cross-processor integration discovers richer, synergistic information that no single processor could produce alone (Liang et al., 2024; Partan & Marler, 1999). The enriched long-term memories are carried into the next iteration of Step 1, closing the inference loop. ▷ Overall: Iterative inference loop. The CTM theory prescribes a continuous cycle ofprediction, feedback, and learning(Blum & Blum, 2022). CTM-AI preserves this structure and forms Algorithm 1. Prediction. All processors produce chunks from current observations and accumulated memory, then compete via the up-tree to select the conscious content. Feedback. For agentic tasks (e.g., web navigation tasks), motor processors translate the conscious content into actions on the external environment. The environment's response returns as a new observation oT , providing feedback that informs the next step. For non-agentic tasks (e.g., multi- modal perception), no external feedback is available, and the system instead relies on iterative internal refinement. Learning. In the original CTM, learning is realized through the Sleeping Experts Algorithm, which adjusts processor weights based on prediction outcomes. CTM-AI instead leverages in-context learning for self-reported score updates, requiring no parameter updates, through two evolving mech- anisms: (1)memory evolving: broadcast chunks and fused responses are written into each processor's private mem- ory, enriching the context window for future inference; and (2)structural evolving: new links form betwee"},{"citing_arxiv_id":"2604.24198","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis","primary_cat":"cs.CL","submitted_at":"2026-04-27T09:00:30+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[46] Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, and Aviral Kumar. 2025. Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning. In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025. OpenReview.net. https://openreview.net/forum?id= A6Y7AqlzLW [47] Zhihong Shao, Yuxiang Luo, Chengda Lu, Z. Z. Ren, Jiewen Hu, Tian Ye, Zhibin Gou, Shirong Ma, and Xiaokang Zhang. 2025. DeepSeekMath-V2: Towards Self- Verifiable Mathematical Reasoning.CoRRabs/2511.22570 (2025). arXiv:2511.22570 doi:10.48550/ARXIV.2511.22570 [48] Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y."},{"citing_arxiv_id":"2604.23136","ref_index":62,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"How Researchers Navigate Accountability, Transparency, and Trust When Using AI Tools in Early-Stage Research: A Think-Aloud Study","primary_cat":"cs.CY","submitted_at":"2026-04-25T04:35:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A think-aloud study reveals that AI tools in early research misrepresent uncertainty, obscure provenance, and create fragile trust, leading researchers to develop compensatory strategies to preserve scholarly judgment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22861","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"IntrAgent: An LLM Agent for Content-Grounded Information Retrieval through Literature Review","primary_cat":"cs.IR","submitted_at":"2026-04-23T01:55:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"IntrAgent uses a two-stage pipeline of section ranking and iterative reading to perform content-grounded literature information retrieval, achieving 13.2% higher accuracy than RAG and agent baselines on the new IntraBench benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.03460","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FermiLink: A Unified Agent Framework for Multidomain Autonomous Scientific Simulations","primary_cat":"physics.chem-ph","submitted_at":"2026-04-03T21:09:19+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"FermiLink is a unified AI agent framework that automates multidomain scientific simulations via separated package knowledge bases and a four-layer progressive disclosure mechanism, reproducing 56% of target figures in benchmarks and generating research-grade results on unpublished problems.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"[12] C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha, The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery, arXiv:2408.06292 (2024). [13] J. Gottweis, W.-H. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, A. Myaskovsky, F. Weissenberger, K. Rong, R. Tanno,et al., Towards an AI co-scientist, arXiv:2502.18864 (2025). [14] S. Schmidgall, Y. Su, Z. Wang, X. Sun, J. Wu, X. Yu, J. Liu, M. Moor, Z. Liu, and E. Barsoum, Agent Lab- oratory: Using LLM Agents as Research Assistants, arXiv:2501.04227 (2025). [15] M. C. Ramos, C. J. Collison, and A. D. White, A re- view of large language models and autonomous agents in chemistry, Chem. Sci.16, 2514 (2025). [16] R. S. K. Gadde, S."},{"citing_arxiv_id":"2604.02360","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Fighting AI with AI: AI-Agent Augmented DNS Blocking of LLM Services during Student Evaluations","primary_cat":"cs.NI","submitted_at":"2026-03-20T21:02:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AI-Sinkhole uses AI classification with quantized LLMs and Pi-Hole DNS blocking to dynamically prevent access to LLM services during student evaluations, reporting F1 scores above 0.83.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.10154","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PRISM-XR: Empowering Privacy-Aware XR Collaboration with Multimodal Large Language Models","primary_cat":"cs.CR","submitted_at":"2026-02-09T21:28:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PRISM-XR adds edge-based sensitive-data filtering and quick registration to MLLM-driven XR collaboration, reporting 90% request accuracy, sub-0.3s registration, and over 90% sensitive-object filtering in a 28-person study.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.15895","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Co-Constructing Alignment: A Participatory Approach to Situate AI Values","primary_cat":"cs.HC","submitted_at":"2026-01-22T12:20:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Misalignments appear in practice as unexpected responses and task breakdowns, with users proposing roles such as adjusting model output, interpreting behavior, or deliberate non-use to co-construct alignment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2601.14289","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension","primary_cat":"cs.CL","submitted_at":"2026-01-14T11:37:00+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RPC-Bench supplies 15K verified QA pairs and a research-flow taxonomy that shows top foundation models still achieve only 68.2 percent correctness-completeness on academic paper comprehension.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.06879","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"WisPaper: Your AI Scholar Search Engine","primary_cat":"cs.IR","submitted_at":"2025-12-07T15:10:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"WisPaper integrates semantic search with agent-based validation, library organization, and personalized AI feeds into a closed-loop system that improves academic paper discovery and long-term awareness.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.01089","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CodeDistiller: Automatically Generating Code Libraries for Scientific Coding Agents","primary_cat":"cs.AI","submitted_at":"2025-11-30T21:19:10+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CodeDistiller distills 250 materials-science GitHub repositories into vetted code libraries that improve the accuracy and scientific soundness of experiments generated by ASD agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.13896","ref_index":38,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GenCellAgent: Generalizable, Training-Free Cellular Image Segmentation via Large Language Model Agents","primary_cat":"q-bio.QM","submitted_at":"2025-10-14T17:02:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"GenCellAgent deploys a planner-executor-evaluator LLM agent loop to automatically select, adapt, and refine segmentation tools for diverse cellular microscopy images, matching or exceeding specialist performance on 4,718 images across seven benchmarks while handling out-of-distribution and novel-ves","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.20328","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Video models are zero-shot learners and reasoners","primary_cat":"cs.LG","submitted_at":"2025-09-24T17:17:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Chen, and Lei Li. Multilingual machine translation with large language models: Empirical results and analysis.arXiv preprint arXiv:2304.04675, 2023. [5] Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The AI Scien- tist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024. [6] Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Michael Moor, Zicheng Liu, and Emad Barsoum. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04227, 2025. [7] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al."},{"citing_arxiv_id":"2507.21035","ref_index":98,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis","primary_cat":"cs.AI","submitted_at":"2025-07-28T17:55:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"04227, 2025. [96] S. Schmidgall, Y. Su, Z. Wang, X. Sun, J. Wu, X. Yu, J. Liu, Z. Liu, and E. Barsoum. Agent labora- 18 tory: Using llm agents as research assistants. arXiv preprint arXiv:2501.04227, 2025. [97] D. Shapiro, W. Li, M. Delaflor, and C. Toxtli. Conceptual framework for autonomous cognitive entities. arXiv preprint arXiv: 2310.06775, 2023. [98] M. Shen and Q. Yang. From mind to machine: The rise of manus ai as a fully autonomous digital agent, 2025. [99] C. Si, D. Yang, and T. Hashimoto. Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers. arXiv preprint arXiv:2409.04109, 2024. [100] K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N."},{"citing_arxiv_id":"2507.11810","ref_index":153,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator","primary_cat":"cs.DL","submitted_at":"2025-07-16T00:11:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Keywords Large language models · Scientific innovation · AI agent · Knowledge discovery · Human-AI collaboration 1 Introduction Science operates as a complex, self-organizing network where knowledge propagates through publications [35]. Scien- tific innovation adds negative entropy to complex scientific networks by reorganizing existing knowledge elements or introducing new knowledge, thereby driving scientific progress [45, 190, 153]. This progress enhances our under- standing of the natural world and human social phenomena, while also fueling technological revolutions that stimulate economic growth and sustainable human development [133]. However, scientific innovation faces intertwined structural and behavioral challenges. Despite the explosive growth of scientific literature, idea expansion remains largely linear,"},{"citing_arxiv_id":"2506.22653","ref_index":18,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"URSA: The Universal Research and Scientific Agent","primary_cat":"cs.AI","submitted_at":"2025-06-27T21:56:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"URSA is a modular agent ecosystem that uses LLMs and scientific tools to accelerate research tasks of varying complexity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.18841","ref_index":29,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning","primary_cat":"cs.CL","submitted_at":"2025-06-23T16:59:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LongWriter-Zero applies RL from a base model with specialized rewards for length, quality, and structure to outperform SFT baselines and larger models on long-writing benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.10060","ref_index":53,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Textual Bayes: Quantifying Prompt Uncertainty in LLM-Based Systems","primary_cat":"cs.LG","submitted_at":"2025-06-11T18:00:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces a Bayesian framework viewing LLM prompts as textual parameters and proposes MHLP, a novel MCMC algorithm using LLM proposals, to perform inference and improve accuracy plus uncertainty quantification on benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.19678","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review","primary_cat":"cs.AI","submitted_at":"2025-04-28T11:08:22+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"to as agentic AI, can generate hypotheses, conduct literature reviews, design experiments, analyze data, accelerate scientific discovery, and reduce research costs [14], [15], [16], [17]. Several frameworks, such as LitSearch, ResearchArena, and Agent Laboratory, have been developed to automate various research tasks, including citation management and academic survey generation [18], [19], [20]. However, challenges persist, especially in executing domain-specific literature reviews and ensuring the reproducibility and reliability of automated pro- cesses [21], [22]. Parallel to these developments in research au- tomation, large language model-based agents have also begun to transform the medical field [23]. These agents are increas-"},{"citing_arxiv_id":"2502.18864","ref_index":293,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Towards an AI co-scientist","primary_cat":"cs.AI","submitted_at":"2025-02-26T06:17:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}