A Dirichlet-prior Bayesian estimator for model success probability replaces Pass@k, delivering faster-converging and more stable rankings with credible intervals on math benchmarks.
Cambridge university press
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2verdicts
UNVERDICTED 2representative citing papers
Develops diagrammatic perturbation theory for free energies with fixed variance and applies it to complete the thermodynamic-limit free energy for a spin system while providing resummations for poorly sampled entropies.
citing papers explorer
-
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
A Dirichlet-prior Bayesian estimator for model success probability replaces Pass@k, delivering faster-converging and more stable rankings with credible intervals on math benchmarks.
-
Diagrammatics of free energies with fixed variance for high-dimensional data
Develops diagrammatic perturbation theory for free energies with fixed variance and applies it to complete the thermodynamic-limit free energy for a spin system while providing resummations for poorly sampled entropies.