hub Canonical reference

CoRR , volume =

· 2025 · arXiv 2505.18705

Canonical reference. 80% of citing Pith papers cite this work as background.

21 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 9 dataset 1

citation-polarity summary

background 8 support 1 use dataset 1

representative citing papers

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

cs.AI · 2026-04-28 · accept · novelty 8.0

AutoResearchBench is a new benchmark showing top AI agents achieve under 10% success on complex scientific literature discovery tasks that demand deep comprehension and open-ended search.

Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research

cs.CE · 2026-06-01 · unverdicted · novelty 7.0

Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.

AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents

physics.flu-dyn · 2026-05-07 · conditional · novelty 7.0

AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures missed by solver checks.

MLReplicate: Benchmarking Autonomous Research Systems for Machine Learning Reproducibility

cs.LG · 2026-05-15 · conditional · novelty 6.0

MLReplicate benchmark evaluates six autonomous systems on 45 manuscripts from ICML 2025 papers, finding that automated reviews accept flawed outputs with fabricated claims while human review exposes methodological failures, and that the cheapest system outperforms the most expensive by a wide margin

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.

FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.

Hypothesis generation and updating in large language models

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration

cs.AI · 2026-04-15 · unverdicted · novelty 6.0

TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.

EvoGens: A Population-Based Heuristic Search Framework for Scientific Idea Generation

cs.CL · 2026-05-29 · unverdicted · novelty 5.0

EvoGens uses rank-based mutation, semantic-aware crossover, and lightweight evaluation to evolve populations of LLM-generated scientific ideas, boosting novelty and diversity metrics.

Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

A multi-agent harness autonomously generates functional single-page VIS apps with linked views for scientific data tasks using coordinated skills for analysis, planning, implementation, and evaluation.

AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists

cs.AI · 2026-05-20 · unverdicted · novelty 5.0

AiraXiv is a proposed AI-driven platform for open preprints that supports human and AI authors with interactive UI and MCP-based interactions, validated by serving as the submission system for ICAIS 2025.

Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery

cs.IR · 2026-05-11 · conditional · novelty 5.0

PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.

Toward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI

cs.CY · 2026-05-11 · unverdicted · novelty 5.0

AI lowers the cost of generating plausible scientific artifacts without lowering verification costs, so the paper proposes blueprints as typed graph components that decompose claims, evidence, and assumptions to enable cheaper downstream verification.

GEAR: Genetic AutoResearch for Agentic Code Evolution

cs.NE · 2026-05-08 · unverdicted · novelty 5.0

GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.

pAI/MSc: ML Theory Research with Humans on the Loop

cs.AI · 2026-04-22 · unverdicted · novelty 5.0

pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.

AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories

cs.AI · 2026-04-21 · unverdicted · novelty 5.0

AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.

SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research

cs.AI · 2026-05-20 · unverdicted · novelty 4.0

SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.

AI for Auto-Research: Roadmap & User Guide

cs.AI · 2026-05-18 · unverdicted · novelty 4.0

The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.

Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator

cs.DL · 2025-07-16 · unverdicted · novelty 4.0

The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

cs.LG · 2026-05-17

MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI

cs.LG · 2026-05-09

citing papers explorer

Showing 21 of 21 citing papers.

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery cs.AI · 2026-04-28 · accept · none · ref 4
AutoResearchBench is a new benchmark showing top AI agents achieve under 10% success on complex scientific literature discovery tasks that demand deep comprehension and open-ended search.
Matter to Mechanism: A Benchmark for AI Co-Scientists in Materials and Battery Research cs.CE · 2026-06-01 · unverdicted · none · ref 62
Introduces the Matter to Mechanism benchmark of 2,645 structured instances and a composite metric suite for evaluating AI co-scientists on problem-to-hypothesis reasoning in battery materials research.
AI CFD Scientist: Toward Open-Ended Computational Fluid Dynamics Discovery with Physics-Aware AI Agents physics.flu-dyn · 2026-05-07 · conditional · none · ref 31
AI CFD Scientist autonomously discovers a Spalart-Allmaras runtime correction reducing lower-wall Cf RMSE by 7.89% on the periodic hill at Reh=5600 while using a vision-language gate to detect 14 of 16 silent failures missed by solver checks.
MLReplicate: Benchmarking Autonomous Research Systems for Machine Learning Reproducibility cs.LG · 2026-05-15 · conditional · none · ref 33
MLReplicate benchmark evaluates six autonomous systems on 45 manuscripts from ICML 2025 papers, finding that automated reviews accept flawed outputs with fabricated claims while human review exposes methodological failures, and that the cheapest system outperforms the most expensive by a wide margin
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation cs.AI · 2026-05-11 · unverdicted · none · ref 27
NanoResearch introduces a tri-level co-evolving framework of skills, memory, and policy to personalize LLM-powered research automation across projects and users.
FAME: Forecasting Academic Impact via Continuous-Time Manifold Evolution cs.LG · 2026-05-08 · unverdicted · none · ref 28
FAME models scientific topic trajectories in continuous time to forecast paper impact more accurately than LLMs by aligning manuscripts with field momentum in a dynamic latent space.
Hypothesis generation and updating in large language models cs.LG · 2026-05-07 · unverdicted · none · ref 6
LLMs exhibit Bayesian-like hypothesis updating with strong-sampling bias and an evaluation-generation gap but generalize poorly outside observed data.
TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration cs.AI · 2026-04-15 · unverdicted · none · ref 41
TREX automates the LLM training lifecycle via collaborative agents and tree-based exploration, delivering consistent performance gains across 10 real-world fine-tuning tasks in FT-Bench.
EvoGens: A Population-Based Heuristic Search Framework for Scientific Idea Generation cs.CL · 2026-05-29 · unverdicted · none · ref 21
EvoGens uses rank-based mutation, semantic-aware crossover, and lightweight evaluation to evolve populations of LLM-generated scientific ideas, boosting novelty and diversity metrics.
Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks cs.AI · 2026-05-20 · unverdicted · none · ref 28
A multi-agent harness autonomously generates functional single-page VIS apps with linked views for scientific data tasks using coordinated skills for analysis, planning, implementation, and evaluation.
AiraXiv: An AI-Driven Open-Access Platform for Human and AI Scientists cs.AI · 2026-05-20 · unverdicted · none · ref 20
AiraXiv is a proposed AI-driven platform for open preprints that supports human and AI authors with interactive UI and MCP-based interactions, validated by serving as the submission system for ICAIS 2025.
Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery cs.IR · 2026-05-11 · conditional · none · ref 36
PDR is a user-context-aware framework for LLM research agents that improves report relevance over static baselines, supported by a new dataset and hybrid evaluation.
Toward an Engineering of Science: Rebalancing Generation and Verification in the Age of AI cs.CY · 2026-05-11 · unverdicted · none · ref 34
AI lowers the cost of generating plausible scientific artifacts without lowering verification costs, so the paper proposes blueprints as typed graph components that decompose claims, evidence, and assumptions to enable cheaper downstream verification.
GEAR: Genetic AutoResearch for Agentic Code Evolution cs.NE · 2026-05-08 · unverdicted · none · ref 21
GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.
pAI/MSc: ML Theory Research with Humans on the Loop cs.AI · 2026-04-22 · unverdicted · none · ref 65
pAI/MSc is a customizable multi-agent system that reduces human steering by orders of magnitude when turning a hypothesis into a literature-grounded, mathematically established, experimentally supported manuscript draft in ML theory.
AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories cs.AI · 2026-04-21 · unverdicted · none · ref 21
AblateCell reproduces baselines in three single-cell perturbation repositories with 88.9% success and recovers ground-truth critical components with 93.3% accuracy via closed-loop ablation.
SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research cs.AI · 2026-05-20 · unverdicted · none · ref 31
SciAtlas builds a large-scale multi-disciplinary academic knowledge graph and a neuro-symbolic retrieval system to support automated scientific research tasks such as literature review and idea positioning.
AI for Auto-Research: Roadmap & User Guide cs.AI · 2026-05-18 · unverdicted · none · ref 200
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
Evolving Roles of LLMs in Scientific Innovation: Assistant, Collaborator, Scientist, and Evaluator cs.DL · 2025-07-16 · unverdicted · none · ref 171
The paper proposes a four-role framework for LLMs in scientific innovation and reviews methods, benchmarks, and limitations across Assistant, Collaborator, Scientist, and Evaluator roles.
FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics cs.LG · 2026-05-17 · unreviewed · ref 49
MLS-Bench: A Holistic and Rigorous Assessment of AI Systems on Building Better AI cs.LG · 2026-05-09 · unreviewed · ref 92

CoRR , volume =

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer