WIMPE factorizes reference answers into weighted context-bound points and applies alignment (WPA) and conflict penalty (PCP) metrics, yielding higher human correlation than prior rubric or checklist methods across 10 generative tasks.
- Consider semantic similarity: different wording can be accepted as valid coverage as long as the meaning is equivalent and accurate
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Judge Like Human Examiners: A Weighted Importance Multi-Point Evaluation Framework for Generative Tasks with Long-form Answers
WIMPE factorizes reference answers into weighted context-bound points and applies alignment (WPA) and conflict penalty (PCP) metrics, yielding higher human correlation than prior rubric or checklist methods across 10 generative tasks.