AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.
J Am Stat Assoc 32:675–701
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLaMA 3.1 extracts visual rating scores from Dutch neuroradiology reports with 87-96% balanced accuracy but only 66-80% on numerical counts, with few-shot prompting raising the latter to 81-92%.
citing papers explorer
-
Jagged AI in Scientific Peer Review: Evidence from POMP Data Analysis
AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.