pith. sign in

← back to paper

Review history

arxiv: 2605.17373 · 2 revisions

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

  1. 2026-06-30 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    26298 ms 5860 in 1022 out 2026-06-30T18:58:15.038966+00:00
  2. 2026-05-20 ACCEPT MODERATE v0.9.0 novelty 6.0
    37790 ms 5860 in 1537 out 2026-05-20T14:21:19.495102+00:00