Rottenreviews: Benchmarking review quality with human and llm-based judgments

Sajad Ebrahimi, Soroush Sadeghian, Ali Ghorbanpour, Negar Arabzadeh, Sara Salamat, Muhan Li, Hai Son Le, Mahdi Bashari, Ebrahim Bagheri · 2025 · arXiv 6252.376150

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers

cs.CL · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

CoCoReviewBench curates 3,900 ICLR and NeurIPS papers into category-specific subsets with discussion-based annotations to evaluate AI reviewers on completeness and correctness rather than human review overlap.

PeerPrism: Peer Evaluation Expertise vs Review-writing AI

cs.CL · 2026-04-16 · unverdicted · novelty 7.0

PeerPrism benchmark demonstrates that state-of-the-art LLM detectors conflate surface text style with intellectual contribution and fail on hybrid human-AI peer reviews.

PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

PRISM benchmark finds LLMs match or exceed humans on isolated review dimensions like novelty verification but none achieve the balanced performance of human reviewers across depth, flaw prioritization, and constructiveness.

Expand More, Shrink Less: Shaping Effective-Rank Dynamics for Dense Scaling in Recommendation

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

RankElastor mitigates embedding collapse via spectrum-robust token mixing and GLU-based P-FFNs, yielding better performance and scaling on industrial recommendation datasets.

SID-Coord: Coordinating Semantic IDs for ID-based Ranking in Short-Video Search

cs.IR · 2026-04-12 · unverdicted · novelty 5.0

SID-Coord coordinates semantic IDs with hashed item IDs via attention fusion, adaptive gating, and interest alignment, yielding +0.664% long-play rate and +0.369% playback duration gains in production search ranking.

Taiji: Pareto Optimal Policy Optimization with Semantics-IDs Trade-off for Industrial LLM-Enhanced Recommendation

cs.IR · 2026-06-02 · unverdicted · novelty 4.0

Taiji presents a LLM-as-Enhancer system with reverse-engineered CoT data generation and Pareto Optimal Policy Optimization (POPO) to trade off semantic and ID rewards, deployed at Kuaishou serving 400M daily users.

Peerispect: Claim Verification in Scientific Peer Reviews

cs.CL · 2026-04-19 · unverdicted · novelty 4.0

Peerispect extracts claims from peer reviews, retrieves evidence from the manuscript, and verifies them via NLI in a modular pipeline with a visual interface.

citing papers explorer

Showing 1 of 1 citing paper after filters.

CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers cs.CL · 2026-05-08 · unverdicted · none · ref 5 · 2 links
CoCoReviewBench curates 3,900 ICLR and NeurIPS papers into category-specific subsets with discussion-based annotations to evaluate AI reviewers on completeness and correctness rather than human review overlap.

Rottenreviews: Benchmarking review quality with human and llm-based judgments

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer