hub Canonical reference

Kosmos: An AI Scientist for Autonomous Discovery

Ludovico Mitchener, Angela Yiu, Benjamin Chang, Mathieu Bourdenx, Tyler Nadolski, Arvis Sulovari · 2025 · cs.AI · arXiv 2511.02824

Canonical reference. 90% of citing Pith papers cite this work as background.

33 Pith papers citing it

Background 90% of classified citations

open full Pith review browse 33 citing papers arXiv PDF

abstract

Data-driven scientific discovery requires iterative cycles of literature search, hypothesis generation, and data analysis. Substantial progress has been made towards AI agents that can automate scientific research, but all such agents remain limited in the number of actions they can take before losing coherence, thus limiting the depth of their findings. Here we present Kosmos, an AI scientist that automates data-driven discovery. Given an open-ended objective and a dataset, Kosmos runs for up to 12 hours performing cycles of parallel data analysis, literature search, and hypothesis generation before synthesizing discoveries into scientific reports. Unlike prior systems, Kosmos uses a structured world model to share information between a data analysis agent and a literature search agent. The world model enables Kosmos to coherently pursue the specified objective over 200 agent rollouts, collectively executing an average of 42,000 lines of code and reading 1,500 papers per run. Kosmos cites all statements in its reports with code or primary literature, ensuring its reasoning is traceable. Independent scientists found 79.4% of statements in Kosmos reports to be accurate, and collaborators reported that a single 20-cycle Kosmos run performed the equivalent of 6 months of their own research time on average. Furthermore, collaborators reported that the number of valuable scientific findings generated scales linearly with Kosmos cycles (tested up to 20 cycles). We highlight seven discoveries made by Kosmos that span metabolomics, materials science, neuroscience, and statistical genetics. Three discoveries independently reproduce findings from preprinted or unpublished manuscripts that were not accessed by Kosmos at runtime, while four make novel contributions to the scientific literature.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 other 1

citation-polarity summary

background 9 unclear 1

representative citing papers

Evaluating Large Language Models in Scientific Discovery

cs.AI · 2025-12-17 · unverdicted · novelty 8.0

The SDE benchmark shows LLMs lag on scientific discovery tasks relative to general science tests, with diminishing scaling returns and shared weaknesses across models.

Affinage: genome-scale mechanistic gene annotation from the published literature

q-bio.GN · 2026-07-02 · conditional · novelty 7.0

Affinage uses a two-pass LLM system to generate literature-derived mechanistic annotations for nearly the entire human proteome, outperforming UniProt on 99.1% of genes per an LLM judge.

Harnessing the Collective Intelligence of AI Agents in the Wild for New Discoveries

cs.CL · 2026-06-09 · unverdicted · novelty 7.0

EinsteinArena is a platform for AI agents to collectively discover new mathematical results through open interaction, achieving 12 new state-of-the-art outcomes including raising the 11-dimensional kissing number lower bound from 593 to 604.

Forecasting Scientific Progress with Artificial Intelligence

cs.AI · 2026-05-21 · unverdicted · novelty 7.0

Introduces the CUSP benchmark across 4760 events and finds frontier AI models can pick plausible directions but fail to predict whether or when scientific advances will occur, with performance varying by domain and insensitive to training cutoffs.

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale

cs.LG · 2026-05-07 · conditional · novelty 7.0 · 4 refs

Starling, a multi-agent LLM system, extracts ~6.3 million nuanced structured records from PubMed across six tasks with reported error rates of 0.6-7.7%, lower than several curated databases.

AI co-mathematician: Accelerating mathematicians with agentic AI

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

An interactive AI workbench for mathematicians achieves 48% on FrontierMath Tier 4 and helped solve open problems in early tests.

Optimizing ground state preparation protocols with autoresearch

quant-ph · 2026-04-28 · unverdicted · novelty 7.0 · 2 refs

AI coding agents evolve simple ground-state protocols into improved versions for VQE, DMRG, and AFQMC on spin models and molecules by using executable energy scores under fixed compute budgets.

AI scientists produce results without reasoning scientifically

cs.AI · 2026-04-20 · conditional · novelty 7.0

LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.

CREATE: Testing LLMs for Associative Creativity

cs.CL · 2026-03-10 · unverdicted · novelty 7.0

CREATE is a benchmark that scores LLMs on their ability to produce many specific and diverse associative paths between concepts drawn from parametric knowledge.

El Agente Quntur: A research collaborator agent for quantum chemistry

physics.chem-ph · 2026-02-04 · unverdicted · novelty 7.0

El Agente Quntur is a new multi-agent system that uses reasoning over literature and software documentation to autonomously handle the full workflow of quantum chemistry experiments in ORCA.

Scalable Agentic Reasoning for Designing Biologics Targeting Intrinsically Disordered Proteins

q-bio.QM · 2025-12-17 · unverdicted · novelty 7.0

StructBioReasoner is a scalable multi-agent system that designs IDP-targeting biologics, with over 50% of 787 candidates for Der f 21 showing better binding free energy than human-designed references.

Evidence-Informed LLM Beliefs for Continual Scientific Discovery

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

Evidence-informed belief updates make Bayesian surprise non-stationary in LLM hypothesis search, with embedding-based RAG identifying 37.5% spurious static surprisals and modified search (filtering plus diversity) yielding 30.62% higher accumulated non-stationary surprisal across five domains.

Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods

cs.CL · 2026-06-15 · unverdicted · novelty 6.0

LLMs given only research questions from 1000 arXiv CS papers recommend a narrower set of methods than the original papers, with effective model-entity diversity dropping from 1232 to 59-96 and stronger agreement among LLMs than with papers.

Towards Diverse Scientific Hypothesis Search with Large Language Models

cs.LG · 2026-06-09 · unverdicted · novelty 6.0

A parallel-tempering evolutionary framework for LLM hypothesis search improves both quality and diversity of candidates in molecular, equation, and algorithm discovery under fixed validation budgets.

General-purpose LLMs as Constrained Crystal Composition Generators

cond-mat.mtrl-sci · 2026-05-29 · unverdicted · novelty 6.0

General-purpose LLMs recover 96% of low-energy Elpasolites via iterative in-context learning, surpassing task-specific models on an established benchmark.

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation

cs.AI · 2026-05-27 · unverdicted · novelty 6.0

Decentralized AI agent teams self-organize around hypotheses, critique proposals, and share knowledge to outperform single-agent baselines on biomedical ML, language-model optimization, and protein fitness tasks.

AgensFlow: A Coordination-Policy Substrate for Multi-Agent Systems

cs.MA · 2026-05-26 · unverdicted · novelty 6.0

AgensFlow learns coordination policies from task trajectories and outperforms fixed pipelines on distributed-systems incident and security-advisory tasks.

Towards Discovery of Polymers for Insulin Delivery via Physics-Grounded Agentic Workflows

q-bio.QM · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

An LLM-orchestrated physics simulation search identifies polymers with strong insulin interactions, outperforming standard optimization methods by significant margins.

Unlocking LLM Creativity in Science through Analogical Reasoning

cs.AI · 2026-05-11 · conditional · novelty 6.0

Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.

Intentmaking and Sensemaking: Human Interaction with AI-Guided Mathematical Discovery

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

Expert mathematicians using an AI coding agent for discovery engage in repeated cycles of intentmaking to define goals and sensemaking to interpret outputs.

Hypothesis generation and updating in large language models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

PRL-Bench: A Comprehensive Benchmark Evaluating LLMs' Capabilities in Frontier Physics Research

cs.LG · 2026-04-16 · unverdicted · novelty 6.0

PRL-Bench evaluates frontier LLMs on 100 real physics research tasks and finds the best models score below 50, exposing a gap to autonomous discovery.

DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review

cs.AI · 2026-03-03 · unverdicted · novelty 6.0

An agentic system produces traceable review packages and an un-finetuned 196B model using it covers more major issues than Gemini-3.1-Pro on 134 ICLR 2025 submissions while winning most blind comparisons to human committees.

Language Model Goal Selection Differs from Humans' in a Self-Directed Learning Task

cs.CL · 2026-02-06 · unverdicted · novelty 6.0

LLMs diverge from human goal selection in self-directed learning by exploiting single solutions with low variability across instances.

citing papers explorer

Showing 8 of 8 citing papers after filters.

Self-Driving Datasets: From 20 Million Papers to Nuanced Biomedical Knowledge at Scale cs.LG · 2026-05-07 · conditional · none · ref 47 · 4 links · internal anchor
Starling, a multi-agent LLM system, extracts ~6.3 million nuanced structured records from PubMed across six tasks with reported error rates of 0.6-7.7%, lower than several curated databases.
Optimizing ground state preparation protocols with autoresearch quant-ph · 2026-04-28 · unverdicted · none · ref 26 · 2 links · internal anchor
AI coding agents evolve simple ground-state protocols into improved versions for VQE, DMRG, and AFQMC on spin models and molecules by using executable energy scores under fixed compute budgets.
AI scientists produce results without reasoning scientifically cs.AI · 2026-04-20 · conditional · none · ref 20 · internal anchor
LLM agents execute scientific tasks but fail to follow core scientific reasoning norms such as evidence consideration and belief revision based on refutations.
El Agente Quntur: A research collaborator agent for quantum chemistry physics.chem-ph · 2026-02-04 · unverdicted · none · ref 37 · internal anchor
El Agente Quntur is a new multi-agent system that uses reasoning over literature and software documentation to autonomously handle the full workflow of quantum chemistry experiments in ORCA.
SciResearcher: Scaling Deep Research Agents for Frontier Scientific Reasoning cs.AI · 2026-05-02 · unverdicted · none · ref 26 · internal anchor
SciResearcher is a new agentic data-construction framework that trains an 8B model via supervised fine-tuning and reinforcement learning to reach 19.46% on HLE-Bio/Chem-Gold and 13-15% gains on related biology and literature benchmarks.
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery cs.AI · 2026-05-22 · unverdicted · none · ref 89 · internal anchor
A survey organizing AI-powered research automation into five workflow stages, defining AutoResearch and Vibe Research, and proposing five evaluation dimensions while noting domain-conditioned limits on autonomy.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 133 · internal anchor
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
Are Researchers Being Replaced by Artificial Intelligence? cs.CY · 2026-04-14 · unverdicted · none · ref 2 · internal anchor
AI is shifting researchers from creators to curators of generated content, risking loss of intellectual ownership and genuine understanding of science.

Kosmos: An AI Scientist for Autonomous Discovery

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer