B ayesian Calibration of Win Rate Estimation with LLM Evaluators

Gao, Yicheng, Xu, Gonghan, Wang, Zhe, Cohan, Arman , booktitle = · 2024 · DOI 10.18653/v1/2024.emnlp-main.273

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Empirical Study on the Characteristics and Evolution of AI-usage in GitHub Repositories: Evidence from Code Comments

cs.SE · 2026-06-05 · unverdicted · novelty 6.0

Analysis of 35k AI-referencing GitHub comments shows primary use for code implementation, with evolution toward conceptual support and sustained human refinement over time.

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems

cs.LG · 2026-05-04 · unverdicted · novelty 5.0

STABLEVAL produces stable AI system rankings by modeling latent correctness and annotator confusion rather than majority vote aggregation.

citing papers explorer

Showing 1 of 1 citing paper after filters.

STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems cs.LG · 2026-05-04 · unverdicted · none · ref 9
STABLEVAL produces stable AI system rankings by modeling latent correctness and annotator confusion rather than majority vote aggregation.

B ayesian Calibration of Win Rate Estimation with LLM Evaluators

fields

years

verdicts

representative citing papers

citing papers explorer