Ask me like I’m human: LLM-based evaluation with for-human instructions correlates better with human evaluations than human judges

· 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

BatteryPass-12K is the first public benchmark dataset for digital battery passport conformance classification, with evaluations of 22 language models showing thinking models achieve the highest F1 scores.

citing papers explorer

Showing 1 of 1 citing paper.

BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task cs.CL · 2026-04-28 · unverdicted · none · ref 32
BatteryPass-12K is the first public benchmark dataset for digital battery passport conformance classification, with evaluations of 22 language models showing thinking models achieve the highest F1 scores.

Ask me like I’m human: LLM-based evaluation with for-human instructions correlates better with human evaluations than human judges

fields

years

verdicts

representative citing papers

citing papers explorer