Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices , url=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Automation in embodied benchmark construction shifts costs from acquisition toward validation, auditability, version control, and long-term governance instead of simply lowering total cost.
citing papers explorer
-
A Tale of Two Variances: When Single-Seed Benchmarks Fail in Bayesian Deep Learning
Single-seed CRPS estimates in limited-data BDL show high variance and peaks for heteroscedastic methods, with local variance correlating above 0.96 to single-seed error.
-
Intelligent Automation for Embodied Benchmark Construction: Pipelines, Embodiments, Simulators, and Trends
Automation in embodied benchmark construction shifts costs from acquisition toward validation, auditability, version control, and long-term governance instead of simply lowering total cost.
- Position: State-of-the-Art Claims Require State-of-the-Art Evidence