- Consider semantic similarity: different wording can be accepted as valid coverage as long as the meaning is equivalent and accurate

**Identify Match Type**: - Determine whether the [Generated Answer] includes the scoring point content explicitly or implicitly

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Judge Like Human Examiners: A Weighted Importance Multi-Point Evaluation Framework for Generative Tasks with Long-form Answers

cs.CL · 2026-04-13 · unverdicted · novelty 6.0

WIMPE factorizes reference answers into weighted context-bound points and applies alignment (WPA) and conflict penalty (PCP) metrics, yielding higher human correlation than prior rubric or checklist methods across 10 generative tasks.

citing papers explorer

Showing 1 of 1 citing paper.

Judge Like Human Examiners: A Weighted Importance Multi-Point Evaluation Framework for Generative Tasks with Long-form Answers cs.CL · 2026-04-13 · unverdicted · none · ref 14
WIMPE factorizes reference answers into weighted context-bound points and applies alignment (WPA) and conflict penalty (PCP) metrics, yielding higher human correlation than prior rubric or checklist methods across 10 generative tasks.

- Consider semantic similarity: different wording can be accepted as valid coverage as long as the meaning is equivalent and accurate

fields

years

verdicts

representative citing papers

citing papers explorer