Pairwise or pointwise? evaluating feedback protocols for bias in llm-based evaluation

· 2025 · arXiv 2504.14716

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment

cs.CL · 2026-05-24 · unverdicted · novelty 7.0

JudgmentBench supplies the first public paired rubric and preference annotations from legal experts on the same LLM outputs, showing comparative judgments outperform rubrics in recovering quality orderings.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning

cs.LG · 2026-04-08 · unverdicted · novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.

Trust Region On-Policy Distillation

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

JudgmentBench: Comparing Rubric and Preference Evaluation for Quality Assessment cs.CL · 2026-05-24 · unverdicted · none · ref 8
JudgmentBench supplies the first public paired rubric and preference annotations from legal experts on the same LLM outputs, showing comparative judgments outperform rubrics in recovering quality orderings.
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning cs.LG · 2026-04-08 · unverdicted · none · ref 112
This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
Trust Region On-Policy Distillation cs.LG · 2026-05-31 · unverdicted · none · ref 152
TrOPD stabilizes on-policy distillation for LLMs with trust-region learning, outlier estimation, and off-policy guidance, outperforming prior OPD methods on reasoning and code benchmarks.

Pairwise or pointwise? evaluating feedback protocols for bias in llm-based evaluation

fields

years

verdicts

representative citing papers

citing papers explorer