Autoharness: improving llm agents by automatically synthesizing a code harness

· 2026 · arXiv 2603.03329

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Agentic MIP Research: Accelerated Constraint Handler Generation

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

LLM agents in a solver-aware harness recover global constraints from MIP formulations, generate executable propagation-only handlers for SCIP, and solve five additional MIPLIB 2017 instances.

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.

FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.

AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs

cs.CL · 2026-04-24 · unverdicted · novelty 6.0

AutoPyVerifier learns compact sets of executable Python verifiers from labeled LLM outputs via LLM synthesis and DAG search, improving objective prediction by up to 55 F1 points and downstream LLM accuracy by up to 17 points.

Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.

Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.

The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents

cs.AI · 2026-05-11 · unverdicted · novelty 4.0

Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.

citing papers explorer

Showing 7 of 7 citing papers.

Agentic MIP Research: Accelerated Constraint Handler Generation cs.AI · 2026-05-09 · unverdicted · none · ref 11
LLM agents in a solver-aware harness recover global constraints from MIP formulations, generate executable propagation-only handlers for SCIP, and solve five additional MIPLIB 2017 instances.
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning cs.CL · 2026-05-09 · unverdicted · none · ref 19
OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration cs.LG · 2026-05-08 · unverdicted · none · ref 16
FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.
AutoPyVerifier: Learning Compact Executable Verifiers for Large Language Model Outputs cs.CL · 2026-04-24 · unverdicted · none · ref 16
AutoPyVerifier learns compact sets of executable Python verifiers from labeled LLM outputs via LLM synthesis and DAG search, improving objective prediction by up to 55 F1 points and downstream LLM accuracy by up to 17 points.
Agentic Forecasting using Sequential Bayesian Updating of Linguistic Beliefs cs.AI · 2026-04-20 · unverdicted · none · ref 31
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence cs.AI · 2026-04-20 · unverdicted · none · ref 60
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
The Agent Use of Agent Beings: Agent Cybernetics Is the Missing Science of Foundation Agents cs.AI · 2026-05-11 · unverdicted · none · ref 23
Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.

Autoharness: improving llm agents by automatically synthesizing a code harness

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer