The Portable Regime Score PRS=(B/|A|)(1-rho) captures and predicts acquisition function performance reversals in transfer Bayesian optimization, enabling a RegimePlanner that adapts and beats fixed baselines.
A metric learning reality check
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Auditing Sabotage Bench shows frontier LLMs and human auditors achieve at most 0.77 AUROC and 42% top-1 fix rate when trying to detect and correct sabotage in ML codebases.
citing papers explorer
-
Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization
The Portable Regime Score PRS=(B/|A|)(1-rho) captures and predicts acquisition function performance reversals in transfer Bayesian optimization, enabling a RegimePlanner that adapts and beats fixed baselines.
-
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases
Auditing Sabotage Bench shows frontier LLMs and human auditors achieve at most 0.77 AUROC and 42% top-1 fix rate when trying to detect and correct sabotage in ML codebases.