AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
Title resolution pending
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 8roles
background 1polarities
background 1representative citing papers
FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
Lacuna is an LLM-powered research map for ML that outperforms OpenScholar on retrieval benchmarks and GPT-Researcher on multi-stage report generation tasks.
ProReviewer is an MDP-formulated proactive peer review agent trained with SFT and RL on an 8B model that outperforms larger frontier LLMs on review quality metrics.
PRAIB reveals LLM reviews are less variable, positively biased, overconfident, longer, and overlook atomic weaknesses noted by humans compared to real reviewer feedback.
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
Position paper arguing that multi-agent AI systems can become AI scientists and calling for reformed scientific institutions to support their development with emphasis on verification and dual-use safety.
citing papers explorer
-
Lacuna: A Research Map for Machine Learning
Lacuna is an LLM-powered research map for ML that outperforms OpenScholar on retrieval benchmarks and GPT-Researcher on multi-stage report generation tasks.
-
From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent
ProReviewer is an MDP-formulated proactive peer review agent trained with SFT and RL on an 8B model that outperforms larger frontier LLMs on review quality metrics.
-
PRAIB: Peer Review AI Benchmark of Behaviour of LLM-Assisted Reviewing
PRAIB reveals LLM reviews are less variable, positively biased, overconfident, longer, and overlook atomic weaknesses noted by humans compared to real reviewer feedback.
-
Learning What Evaluators Value: A Reliable Approach to Modeling Evaluator Preferences
Presents a robust algorithm for learning any coordinate-wise non-decreasing evaluator preference function, with theoretical guarantees that it matches linear performance when linearity holds.
-
AI for Auto-Research: Roadmap & User Guide
The paper delivers a stage-by-stage roadmap for AI in research, showing reliable assistance in retrieval and tool tasks but fragility in novelty and judgment, advocating human-governed collaboration.
-
AI Scientists as Engines of Discovery: A Case for Development within Reformed Institutions
Position paper arguing that multi-agent AI systems can become AI scientists and calling for reformed scientific institutions to support their development with emphasis on verification and dual-use safety.