CheckEval: A reliable LLM-as-a-judge framework for evaluating text generation using checklists

Yukyung Lee, JoongHoon Kim, Jaehee Kim, Hyowon Cho, Jaewook Kang, Pilsung Kang, Najoung Kim · 2025 · DOI 10.18653/v1/2025.emnlp-main.796

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

IPO-Mine releases a toolkit and large multimodal dataset for structured analysis of IPO filings and shows state-of-the-art models diverge from human judgments on chart quality and misleadingness.

Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why

cs.CL · 2026-05-25 · conditional · novelty 6.0

For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.

Provenance-Grounded Gating and Adaptive Recovery in Synthetic Post-Training Data Curation

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

Controlled experiments on synthetic post-training data show provenance-grounded gating and adaptive recovery improve yield and recall over baselines, with generator scale as the primary driver of downstream fine-tuning quality.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why cs.CL · 2026-05-25 · conditional · none · ref 25
For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.

CheckEval: A reliable LLM-as-a-judge framework for evaluating text generation using checklists

fields

years

verdicts

representative citing papers

citing papers explorer