AI peer reviewers for POMP analyses show jagged performance: strong on technical error detection and invalid inference but weak on interpretive errors, narrative coherence, and domain-informed critique.
J Am Stat Assoc 32:675–701
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLaMA 3.1 extracts visual rating scores from Dutch neuroradiology reports with 87-96% balanced accuracy but only 66-80% on numerical counts, with few-shot prompting raising the latter to 81-92%.
citing papers explorer
-
Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model
LLaMA 3.1 extracts visual rating scores from Dutch neuroradiology reports with 87-96% balanced accuracy but only 66-80% on numerical counts, with few-shot prompting raising the latter to 81-92%.