Finesure: Fine-grained summarization evaluation using llms

Hwanjun Song, Hang Su, Igor Shalyminov, Jason Cai, Saab Mansour · 2024 · DOI 10.18653/v1/2024.acl-long.51

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Less is More: Quality-Aware Training Data Selection for Scientific Summarization

cs.CL · 2026-06-23 · unverdicted · novelty 6.0

A 1.88-million-article biomedical summarization dataset is released and quality-aware selection of training data based on abstract alignment outperforms random sampling on factuality metrics.

EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

EvoRubric is a single-policy RL method that co-evolves a reasoner and a rubric generator with multi-level verification to produce dynamic rewards for open-ended LLM alignment.

Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why

cs.CL · 2026-05-25 · conditional · novelty 6.0

For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.

citing papers explorer

Showing 3 of 3 citing papers.

Less is More: Quality-Aware Training Data Selection for Scientific Summarization cs.CL · 2026-06-23 · unverdicted · none · ref 97
A 1.88-million-article biomedical summarization dataset is released and quality-aware selection of training data based on abstract alignment outperforms random sampling on factuality metrics.
EvoRubric: Self-Evolving Rubric-Driven RL for Open-Ended Generation cs.CL · 2026-05-28 · unverdicted · none · ref 16
EvoRubric is a single-policy RL method that co-evolves a reasoner and a rubric generator with multi-level verification to produce dynamic rewards for open-ended LLM alignment.
Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why cs.CL · 2026-05-25 · conditional · none · ref 34
For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.

Finesure: Fine-grained summarization evaluation using llms

fields

years

verdicts

representative citing papers

citing papers explorer