GPTS core: Evaluate as You Desire

Fu, Jinlan, Ng, See-Kiong, Jiang, Zhengbao, Liu, Pengfei · 2024 · DOI 10.18653/v1/2024.naacl-long.365

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations

cs.CL · 2026-05-13 · unverdicted · novelty 7.0

Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.

AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals

cs.LG · 2026-04-15 · unverdicted · novelty 7.0

AsymmetryZero operationalizes expert preferences as stable evaluation contracts for semantic evals, with a study showing 75.9-89.6% criterion agreement between frontier and compact model juries at 4-5% of the cost.

SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.

citing papers explorer

Showing 3 of 3 citing papers.

Creativity Bias: How Machine Evaluation Struggles with Creativity in Literary Translations cs.CL · 2026-05-13 · unverdicted · none · ref 105
Automatic evaluation tools for literary translations correlate poorly with expert human judgments on creativity and exhibit bias favoring machine-translated texts.
AsymmetryZero: A Framework for Operationalizing Human Expert Preferences as Semantic Evals cs.LG · 2026-04-15 · unverdicted · none · ref 2
AsymmetryZero operationalizes expert preferences as stable evaluation contracts for semantic evals, with a study showing 75.9-89.6% criterion agreement between frontier and compact model juries at 4-5% of the cost.
SCURank: Ranking Multiple Candidate Summaries with Summary Content Units for Enhanced Summarization cs.CL · 2026-04-21 · unverdicted · none · ref 7
SCURank ranks multiple summary candidates with Summary Content Units to outperform ROUGE and LLM-based methods in summarization distillation.

GPTS core: Evaluate as You Desire

fields

years

verdicts

representative citing papers

citing papers explorer