Poller reduces LLM-human disagreement in evaluating Chinese poetry understanding by having LLMs role-play as authors, with reported error reductions of 94.55% and 89.53% on rhetorical techniques and defamiliarization.
five-character quatrain
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2verdicts
UNVERDICTED 2representative citing papers
LLMs show an echo chamber bias in evaluating classical Chinese poetry, systematically favoring machine-generated outputs that violate strict form rules over human expert assessments.
citing papers explorer
-
Capabilities and Evaluation Biases of Large Language Models in Classical Chinese Poetry Generation: A Case Study on Tang Poetry
LLMs show an echo chamber bias in evaluating classical Chinese poetry, systematically favoring machine-generated outputs that violate strict form rules over human expert assessments.