HarnessFix diagnoses harness flaws from agent traces via HTIR, maps them to repair operators, and improves benchmark performance by 6.3-18.4% over baselines.
AgentDevel: Reframing Self-Evolving LLM Agents as Release Engineering,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws
HarnessFix diagnoses harness flaws from agent traces via HTIR, maps them to repair operators, and improves benchmark performance by 6.3-18.4% over baselines.