pith. sign in

← back to paper

Review history

arxiv: 2606.06324 · 2 revisions

From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws

  1. 2026-07-04 UNVERDICTED LOW v0.9.1-grok novelty 5.0
    18888 ms 5822 in 982 out 2026-07-04T00:07:59.145292+00:00
  2. 2026-06-28 UNVERDICTED LOW v0.9.1-grok novelty 6.0
    19665 ms 5795 in 1225 out 2026-06-28T00:06:04.528012+00:00