Recognition: unknown
Peerispect: Claim Verification in Scientific Peer Reviews
Pith reviewed 2026-05-10 05:15 UTC · model grok-4.3
The pith
Peerispect extracts check-worthy claims from peer reviews and verifies them against the manuscript using retrieval and natural language inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Peerispect is presented as a modular information retrieval pipeline that extracts check-worthy claims from peer reviews, retrieves relevant evidence from the manuscript, and verifies the claims through natural language inference, with results displayed through a visual interface that highlights evidence directly in the paper.
What carries the argument
The modular IR pipeline consisting of claim extraction, evidence retrieval, and NLI-based verification, supported by an interactive visual interface.
Load-bearing premise
That current retrievers and natural language inference models can accurately process the specialized and often implicit language used in scientific peer review claims.
What would settle it
A set of peer reviews with manually annotated check-worthy claims and their correct evidence locations and verification outcomes where the system consistently retrieves wrong sections or makes incorrect verification decisions.
Figures
read the original abstract
Peer review is central to scientific publishing, yet reviewers frequently include claims that are subjective, rhetorical, or misaligned with the submitted work. Assessing whether review statements are factual and verifiable is crucial for fairness and accountability. At the scale of modern conferences and journals, manually inspecting the grounding of such claims is infeasible. We present Peerispect, an interactive system that operationalizes claim-level verification in peer reviews by extracting check-worthy claims from peer reviews, retrieving relevant evidence from the manuscript, and verifying the claims through natural language inference. Results are presented through a visual interface that highlights evidence directly in the paper, enabling rapid inspection and interpretation. Peerispect is designed as a modular Information Retrieval (IR) pipeline, supporting alternative retrievers, rerankers, and verifiers, and is intended for use by reviewers, authors, and program committees. We demonstrate Peerispect through a live, publicly available demo (https://app.reviewer.ly/app/peerispect) and API services (https://github.com/Reviewerly-Inc/Peerispect), accompanied by a video tutorial (https://www.youtube.com/watch?v=pc9RkvkUh14).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Peerispect, an interactive modular IR system that extracts check-worthy claims from peer reviews, retrieves relevant evidence from the submitted manuscript, and verifies the claims via natural language inference. Results are displayed in a visual interface highlighting evidence in the paper. The work includes a public demo, API services, and a video tutorial, positioning the tool for use by reviewers, authors, and program committees.
Significance. If the pipeline functions reliably on peer-review text, the system could meaningfully support accountability and efficiency in scientific publishing by automating verification of factual grounding at scale. The modular design (allowing alternative retrievers, rerankers, and verifiers) and public release of the demo and API are clear strengths that facilitate adoption and extension.
major comments (1)
- Abstract and system description: the central claim that Peerispect 'operationalizes claim-level verification' is presented without any quantitative results, error rates, baseline comparisons, human evaluation, or case studies on scientific peer-review language. This leaves the practical effectiveness of the extraction, retrieval, and NLI steps unassessed and makes it impossible to evaluate the weakest assumption that off-the-shelf or fine-tuned models suffice for implicit review claims.
minor comments (2)
- The manuscript would benefit from a dedicated section detailing the specific models or heuristics used for claim extraction and the prompting strategy for NLI, even if modular.
- Figure captions and interface screenshots should explicitly label which components (retriever, verifier) are active in each view to improve clarity for readers reproducing the demo.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need to better substantiate the system's effectiveness. We address the major comment below and outline planned revisions.
read point-by-point responses
-
Referee: [—] Abstract and system description: the central claim that Peerispect 'operationalizes claim-level verification' is presented without any quantitative results, error rates, baseline comparisons, human evaluation, or case studies on scientific peer-review language. This leaves the practical effectiveness of the extraction, retrieval, and NLI steps unassessed and makes it impossible to evaluate the weakest assumption that off-the-shelf or fine-tuned models suffice for implicit review claims.
Authors: We acknowledge that the manuscript presents no quantitative benchmarks, error rates, or human evaluations of the pipeline components on peer-review text. The paper's primary contribution is the design of a modular IR system, its public demo, API, and video tutorial, rather than an empirical study of model performance. We do not claim that off-the-shelf models suffice for implicit claims; the system is explicitly designed to allow substitution of retrievers, rerankers, and verifiers, enabling users to integrate stronger models. To address the concern, the revised manuscript will include a new section with qualitative case studies drawn from real peer reviews, explicit discussion of challenges with implicit and rhetorical claims, and a dedicated limitations section. These additions will provide concrete illustrations of the pipeline in action without overstating generalizability. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper is a descriptive system presentation of Peerispect, an interactive IR pipeline for extracting check-worthy claims from reviews, retrieving manuscript evidence, and applying NLI verification, with a public demo and API. It contains no equations, no fitted parameters, no predictions of quantitative results, and no derivation chain. The central claim is simply that the modular system has been built and demonstrated; there are no self-citations, ansatzes, or uniqueness theorems that reduce the argument to its own inputs by construction. This is a standard non-circular engineering/systems paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Negar Arabzadeh, Sajad Ebrahimi, Ali Ghorbanpour, Soroush Sadeghian, Sara Salamat, Muhan Li, Hai Son Le, Mahdi Bashari, and Ebrahim Bagheri. 2025. Building Trustworthy Peer Review Quality Assessment Systems. InProceedings of the 34th ACM International Conference on Information and Knowledge Management. 6863–6864. doi:10.1145/3746252.3761436
-
[2]
Negar Arabzadeh, Sajad Ebrahimi, Soroush Sadeghian, Seyed Mohammad Hos- seini, Alireza Daqiq, Hai Son Le, Mahdi Bashari, and Ebrahim Bagheri. 2026. Can LLMs Uphold Research Integrity? Evaluating the Role of LLMs in Peer Review Quality. InProceedings of the Nineteenth ACM International Conference on Web Search and Data Mining (WSDM ’26). 1341–1342. doi:10....
-
[3]
Negar Arabzadeh, Sajad Ebrahimi, Sara Salamat, Mahdi Bashari, and Ebrahim Bagheri. 2024. Reviewerly: Modeling the Reviewer Assignment Task as an Information Retrieval Problem. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 5554–5555. doi:10.1145/ 3627673.3679081
- [4]
-
[5]
Kirsten Bell, Patricia Kingori, and David Mills. 2024. Scholarly publishing, bound- ary processes, and the problem of fake peer reviews.Science, Technology, & Human Values49, 1 (2024), 78–104
2024
-
[6]
Pradeep Dasigi, Kyle Lo, Iz Beltagy, Arman Cohan, Noah A. Smith, and Matt Gardner. 2021. A Dataset of Information-Seeking Questions and Answers An- chored in Research Papers. (2021). arXiv:2105.03011 [cs.CL] https://arxiv.org/ abs/2105.03011
-
[7]
Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The Faiss library. (2024). arXiv:2401.08281 [cs.LG]
work page internal anchor Pith review arXiv 2024
-
[8]
John A Drozdz and Michael R Ladomery. 2024. The peer review process: past, present, and future.British Journal of Biomedical Science81 (2024), 12054
2024
-
[9]
Sajad Ebrahimi, Soroush Sadeghian, Ali Ghorbanpour, Negar Arabzadeh, Sara Salamat, Muhan Li, Hai Son Le, Mahdi Bashari, and Ebrahim Bagheri. 2025. RottenReviews: Benchmarking Review Quality with Human and LLM-Based Judgments. InProceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM ’25). 5642–5649. doi:10.1145/3...
-
[10]
Sajad Ebrahimi, Sara Salamat, Negar Arabzadeh, Mahdi Bashari, and Ebrahim Bagheri. 2025. exHarmony: Authorship and Citations for Benchmarking the Reviewer Assignment Problem. InEuropean Conference on Information Retrieval. Springer, 1–16. doi:10.1007/978-3-031-88714-7_1
-
[11]
Prashant Garg. 2020. Problems in peer review.Journal of Clinical and Diagnostic Research(2020)
2020
-
[12]
Odest Chadwicke Jenkins and Matthew E. Taylor. 2025. AAAI-26 Review Pro- cess Update: Scale, Integrity Measures, and Experimental Use of AI-Assisted Reviewing
2025
-
[13]
Gonzalez, Hao Zhang, and Ion Stoica
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient Mem- ory Management for Large Language Model Serving with PagedAttention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles
2023
-
[14]
Carole J Lee, Cassidy R Sugimoto, Guo Zhang, and Blaise Cronin. 2013. Bias in peer review.Journal of the American Society for information Science and Technology64, 1 (2013), 2–17
2013
-
[15]
Seth S Leopold. 2015. Increased manuscript submissions prompt journals to make hard choices.Clinical Orthopaedics and Related Research®(2015)
2015
-
[16]
Rodrigo Nogueira and Kyunghyun Cho. 2020. Passage Re-ranking with BERT. arXiv:1901.04085 [cs.IR] https://arxiv.org/abs/1901.04085
work page internal anchor Pith review arXiv 2020
-
[17]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Em- pirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
work page internal anchor Pith review arXiv 2019
-
[18]
Abdelrahman Sadallah, Tim Baumgärtner, Iryna Gurevych, and Ted Briscoe. 2025. The good, the bad and the constructive: Automatically measuring peer review’s utility for authors. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 28979–29009
2025
- [19]
-
[20]
Richard Smith. 2006. Peer review: a flawed process at the heart of science and journals.Journal of the royal society of medicine99, 4 (2006), 178–182
2006
-
[21]
Jonathan P Tennant, Jonathan M Dugan, Daniel Graziotin, Damien C Jacques, François Waldner, Daniel Mietchen, Yehia Elkhatib, Lauren B Collister, Christina K Pikas, Tom Crick, et al. 2017. A multi-disciplinary perspective on emergent and future innovations in peer review.F1000Research(2017)
2017
-
[22]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal
-
[23]
FEVER: a large-scale dataset for fact extraction and VERification. (2018)
2018
-
[24]
David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, and Hannaneh Hajishirzi. 2020. SciFact: A Benchmark for Fact Checking in Scientific Writing. InProceedings of EMNLP
2020
- [25]
-
[26]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 technical report.arXiv preprint arXiv:2505.09388(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.