Towards execution-grounded automated ai research

Chenglei Si, Zitong Yang, Yejin Choi, Emmanuel Candès, Diyi Yang, Tatsunori Hashimoto · 2026 · arXiv 2601.14525

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

representative citing papers

AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot

cs.AI · 2026-04-15 · conditional · novelty 9.0

AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.

GIANTS: Generative Insight Anticipation from Scientific Literature

cs.CL · 2026-04-10 · unverdicted · novelty 8.0

GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.

Unlocking LLM Creativity in Science through Analogical Reasoning

cs.AI · 2026-05-11 · conditional · novelty 6.0

Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

MLS-Bench shows that current AI agents fall short of reliably inventing generalizable ML methods, with engineering tuning easier than genuine invention.

Agentic Discovery with Active Hypothesis Exploration for Visual Recognition

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

HypoExplore uses LLMs for hypothesis-driven evolutionary search with a Trajectory Tree and Hypothesis Memory Bank to discover lightweight vision architectures, reaching 94.11% accuracy on CIFAR-10 from an 18.91% baseline and generalizing to other datasets including state-of-the-art on MedMNIST.

citing papers explorer

Showing 2 of 2 citing papers after filters.

AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot cs.AI · 2026-04-15 · conditional · none · ref 11
AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
Unlocking LLM Creativity in Science through Analogical Reasoning cs.AI · 2026-05-11 · conditional · none · ref 38
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.

Towards execution-grounded automated ai research

fields

years

verdicts

representative citing papers

citing papers explorer