A metric learning reality check

Kevin Musgrave, Serge Belongie, Ser-Nam Lim · 2020

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

The Portable Regime Score PRS=(B/|A|)(1-rho) captures and predicts acquisition function performance reversals in transfer Bayesian optimization, enabling a RegimePlanner that adapts and beats fixed baselines.

Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases

cs.AI · 2026-04-17 · unverdicted · novelty 6.0

Auditing Sabotage Bench shows frontier LLMs and human auditors achieve at most 0.77 AUROC and 42% top-1 fix rate when trying to detect and correct sabotage in ML codebases.

citing papers explorer

Showing 2 of 2 citing papers.

Regime-Conditioned Evaluation in Multi-Context Bayesian Optimization cs.LG · 2026-05-06 · unverdicted · none · ref 34
The Portable Regime Score PRS=(B/|A|)(1-rho) captures and predicts acquisition function performance reversals in transfer Bayesian optimization, enabling a RegimePlanner that adapts and beats fixed baselines.
Auditing Sabotage Bench: A Benchmark for Detecting and Fixing Research Sabotage in ML Codebases cs.AI · 2026-04-17 · unverdicted · none · ref 42
Auditing Sabotage Bench shows frontier LLMs and human auditors achieve at most 0.77 AUROC and 42% top-1 fix rate when trying to detect and correct sabotage in ML codebases.

A metric learning reality check

fields

years

verdicts

representative citing papers

citing papers explorer