The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
Canonical reference
arXiv preprint arXiv:2404.07738 , doi =
Canonical reference. 71% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
FARS deployed at scale produced 166 AI/ML papers across 67 topics that received 282 structured human reviews indicating some review-worthy outputs alongside recurring failure modes.
GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.
ResearchCube provides a 3D spatial interface with bipolar trade-off dimensions and direct-manipulation interactions to support multi-dimensional research ideation, shown helpful in a study with 11 researchers for externalizing thinking and increasing agency.
Introduces the SciGA-145k dataset with intra-paper and cross-paper graphical abstract recommendation tasks plus the CAR evaluation metric.
Scideator enables facet-based scientific ideation through LLM-driven extraction, human-guided recombination, analogous retrieval, and facet-grounded novelty verification, showing significantly higher creativity support than a baseline LLM in a user study with CS researchers.
OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
ResearchEVO automates the discover-then-explain cycle by evolving algorithms via fitness-driven LLM co-evolution and generating grounded, anti-hallucination research papers through sentence-level RAG.
ERA combines LLMs and tree search to produce expert-level empirical software that outperforms top human methods on single-cell analysis leaderboards and CDC COVID-19 forecasts.
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
PAPERCLAW is a multi-agent system for end-to-end autonomous research paper generation from literature to output, with human refinement and LLM-judge evaluation showing strong results.
LLM research ideation benefits from exposure to diverse mechanisms across domains but does not yet exploit the specific semantic reasons for cross-domain seed retrieval.
Deep Researcher Agent is a framework for autonomous 24/7 deep learning experimentation by LLM agents using zero-cost monitoring, constant-size memory, and a minimal-toolset multi-agent design.
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
citing papers explorer
-
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
-
FARS: A Fully Automated Research System Deployed at Scale
FARS deployed at scale produced 166 AI/ML papers across 67 topics that received 282 structured human reviews indicating some review-worthy outputs alongside recurring failure modes.
-
Graphs of Research: Citation Evolution Graphs as Supervision for Research Idea Generation
GoR extracts citation DAGs using position, frequency, predecessor links and time, then fine-tunes Qwen2.5-7B on 498 seed papers to generate ideas, claiming SOTA over gpt-4o baselines via LLM judges.
-
ResearchCube: Multi-Dimensional Trade-off Exploration for Research Ideation
ResearchCube provides a 3D spatial interface with bipolar trade-off dimensions and direct-manipulation interactions to support multi-dimensional research ideation, shown helpful in a study with 11 researchers for externalizing thinking and increasing agency.
-
SciGA: A Comprehensive Dataset for Designing Graphical Abstracts in Academic Papers
Introduces the SciGA-145k dataset with intra-paper and cross-paper graphical abstract recommendation tasks plus the CAR evaluation metric.
-
Human-LLM Compound System for Scientific Ideation through Facet Recombination and Novelty Evaluation
Scideator enables facet-based scientific ideation through LLM-driven extraction, human-guided recombination, analogous retrieval, and facet-grounded novelty verification, showing significantly higher creativity support than a baseline LLM in a user study with CS researchers.
-
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation
OPD-Evolver uses on-policy self-distillation in fast interaction and slow attribution loops to build agents with holistic memory competence, outperforming prior systems by up to 11.5% and allowing a 9B model to compete with much larger ones.
-
Unlocking LLM Creativity in Science through Analogical Reasoning
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
-
ResearchEVO: An End-to-End Framework for Automated Scientific Discovery and Documentation
ResearchEVO automates the discover-then-explain cycle by evolving algorithms via fitness-driven LLM co-evolution and generating grounded, anti-hallucination research papers through sentence-level RAG.
-
An AI system to help scientists write expert-level empirical software
ERA combines LLMs and tree search to produce expert-level empirical software that outperforms top human methods on single-cell analysis leaderboards and CDC COVID-19 forecasts.
-
GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis
GenoMAS deploys six specialized LLM agents with guided planning to preprocess transcriptomic data and identify genes, reaching 89.13% composite similarity and 60.48% F1 on the GenoTEX benchmark while outperforming prior methods.
-
ToolRL: Reward is All Tool Learning Needs
A principled reward design for tool selection and application in RL-trained LLMs delivers 17% gains over base models and 15% over SFT across benchmarks.
-
A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization
A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.
-
PaperClaw: Harnessing Agents for Autonomous Research and Human-in-the-Loop Refinement
PAPERCLAW is a multi-agent system for end-to-end autonomous research paper generation from literature to output, with human refinement and LLM-judge evaluation showing strong results.
-
Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation
LLM research ideation benefits from exposure to diverse mechanisms across domains but does not yet exploit the specific semantic reasons for cross-domain seed retrieval.
-
Deep Researcher Agent: An Autonomous Framework for 24/7 Deep Learning Experimentation with Zero-Cost Monitoring
Deep Researcher Agent is a framework for autonomous 24/7 deep learning experimentation by LLM agents using zero-cost monitoring, constant-size memory, and a minimal-toolset multi-agent design.
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
- VERITAS: A Multi-Agent Co-Scientist for Verifiable Image-Derived Hypothesis Testing