ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
From replication to redesign: Exploring pairwise comparisons for llm-based peer review
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Controlled prompt interventions reveal strong affiliation bias in LLM peer reviews favoring top-ranked institutions, plus effects from seniority and publication history.
citing papers explorer
-
ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review
ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
-
Justice in Judgment: Unveiling (Hidden) Bias in LLM-assisted Peer Reviews
Controlled prompt interventions reveal strong affiliation bias in LLM peer reviews favoring top-ranked institutions, plus effects from seniority and publication history.