Analysis of 35k AI-referencing GitHub comments shows primary use for code implementation, with evolution toward conceptual support and sustained human refinement over time.
B ayesian Calibration of Win Rate Estimation with LLM Evaluators
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
STABLEVAL produces stable AI system rankings by modeling latent correctness and annotator confusion rather than majority vote aggregation.
citing papers explorer
-
STABLEVAL: Disagreement-Aware and Stable Evaluation of AI Systems
STABLEVAL produces stable AI system rankings by modeling latent correctness and annotator confusion rather than majority vote aggregation.