PRISM benchmark finds LLMs match or exceed humans on isolated review dimensions like novelty verification but none achieve the balanced performance of human reviewers across depth, flaw prioritization, and constructiveness.
From argument diagrams to argumentation mining in texts: A survey.Int
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.
A survey of argumentation mining techniques that reviews models from structured text to social media and proposes a flexible conceptual architecture framework for social media data.
citing papers explorer
-
PRISM: A Multi-Dimensional Benchmark for Evaluating LLM Peer Reviewers
PRISM benchmark finds LLMs match or exceed humans on isolated review dimensions like novelty verification but none achieve the balanced performance of human reviewers across depth, flaw prioritization, and constructiveness.
-
Instructions Shape Production of Language, not Processing
Instructions trigger a production-centered mechanism in language models, with task-specific information stable in input tokens but varying strongly in output tokens and correlating with behavior.