Towards a unified multi-dimensional evaluator for text generation.arXiv preprint arXiv:2210.07197

Ming Zhong, Yang Liu, Da Yin, Yuning Mao, Yizhu Jiao, Pengfei Liu, Chenguang Zhu, Heng Ji, Jiawei Han · 2019 · arXiv 2210.07197

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

representative citing papers

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.

Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives

cs.CL · 2026-04-22 · unverdicted · novelty 6.0

A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.

ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

cs.CL · 2023-08-14 · conditional · novelty 6.0

Multi-agent debate among LLMs yields more reliable text evaluations than single-agent prompting by simulating collaborative human judgment.

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

cs.CL · 2023-03-29 · conditional · novelty 6.0

G-Eval uses GPT-4 with chain-of-thought and form-filling to reach 0.514 Spearman correlation with humans on summarization, beating prior NLG metrics while noting a bias toward LLM outputs.

Calibrating Model-Based Evaluation Metrics for Summarization

cs.CL · 2026-04-19 · unverdicted · novelty 5.0

A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

citing papers explorer

Showing 5 of 5 citing papers.

CAST: Mitigating Object Hallucination in Large Vision-Language Models via Caption-Guided Visual Attention Steering cs.CV · 2026-05-06 · unverdicted · none · ref 82
CAST reduces object hallucination in LVLMs by 6.03% on average across five models and five benchmarks by identifying caption-sensitive attention heads and applying optimized steering directions to their outputs, with negligible added inference cost.
Whose Story Gets Told? Positionality and Bias in LLM Summaries of Life Narratives cs.CL · 2026-04-22 · unverdicted · none · ref 159
A proposed pipeline shows LLMs introduce detectable race and gender biases when summarizing life narratives, creating potential for representational harm in research.
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate cs.CL · 2023-08-14 · conditional · none · ref 28
Multi-agent debate among LLMs yields more reliable text evaluations than single-agent prompting by simulating collaborative human judgment.
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment cs.CL · 2023-03-29 · conditional · none · ref 6
G-Eval uses GPT-4 with chain-of-thought and form-filling to reach 0.514 Spearman correlation with humans on summarization, beating prior NLG metrics while noting a bias toward LLM outputs.
Calibrating Model-Based Evaluation Metrics for Summarization cs.CL · 2026-04-19 · unverdicted · none · ref 56
A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.

Towards a unified multi-dimensional evaluator for text generation.arXiv preprint arXiv:2210.07197

fields

years

verdicts

representative citing papers

citing papers explorer