CoCoReviewBench curates 3,900 ICLR and NeurIPS papers into category-specific subsets with discussion-based annotations to evaluate AI reviewers on completeness and correctness rather than human review overlap.
For guidance on when this is appropriate, please review the NeurIPS ethics guidelines
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Agent Laboratory is an autonomous LLM framework that completes end-to-end research from idea to report and code, with human feedback improving quality and cutting expenses by 84% while reaching competitive ML performance.
citing papers explorer
-
CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
CoCoReviewBench curates 3,900 ICLR and NeurIPS papers into category-specific subsets with discussion-based annotations to evaluate AI reviewers on completeness and correctness rather than human review overlap.
-
Agent Laboratory: Using LLM Agents as Research Assistants
Agent Laboratory is an autonomous LLM framework that completes end-to-end research from idea to report and code, with human feedback improving quality and cutting expenses by 84% while reaching competitive ML performance.