BatteryPass-12K is the first public benchmark dataset for digital battery passport conformance classification, with evaluations of 22 language models showing thinking models achieve the highest F1 scores.
Ask me like I’m human: LLM-based evaluation with for-human instructions correlates better with human evaluations than human judges
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task
BatteryPass-12K is the first public benchmark dataset for digital battery passport conformance classification, with evaluations of 22 language models showing thinking models achieve the highest F1 scores.