CLAIMCHECK : How Grounded are LLM Critiques of Scientific Papers?

Ou, Jiefu, Walden, William, Sanders, Kate, Jiang, Zhengping, Sun, Kaiser, Cheng, Jeffrey · 2025 · DOI 10.18653/v1/2025.findings-emnlp.1185

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

cs.SE · 2026-05-01 · conditional · novelty 8.0

AutoMat benchmark shows current LLM coding agents achieve at most 54.1% success when reproducing computational materials science claims from papers.

Showing 1 of 1 citing paper.

Can Coding Agents Reproduce Findings in Computational Materials Science? cs.SE · 2026-05-01 · conditional · none · ref 53
AutoMat benchmark shows current LLM coding agents achieve at most 54.1% success when reproducing computational materials science claims from papers.