AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
Towards execution-grounded automated ai research
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.
MLS-Bench shows that current AI agents fall short of reliably inventing generalizable ML methods, with engineering tuning easier than genuine invention.
HypoExplore uses LLMs for hypothesis-driven evolutionary search with a Trajectory Tree and Hypothesis Memory Bank to discover lightweight vision architectures, reaching 94.11% accuracy on CIFAR-10 from an 18.91% baseline and generalizing to other datasets including state-of-the-art on MedMNIST.
citing papers explorer
-
AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot
AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
-
Unlocking LLM Creativity in Science through Analogical Reasoning
Analogical reasoning increases LLM solution diversity by 90-173% and novelty rate to over 50%, delivering up to 13-fold gains on biomedical tasks including perturbation prediction and cell communication.