Draw2Think recasts geometric reasoning as agentic interaction with a constraint engine, achieving 95.9% predicate-level construction fidelity and up to 16.4% accuracy gains on solid geometry tasks.
hub Mixed citations
Autoharness: improving llm agents by automatically synthesizing a code harness
Mixed citation behavior. Most common role is background (67%).
hub tools
citation-role summary
citation-polarity summary
years
2026 11representative citing papers
LLM agents in a solver-aware harness recover global constraints from MIP formulations, generate executable propagation-only handlers for SCIP, and solve five additional MIPLIB 2017 instances.
OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.
Arbor combines a coordinator, executors, and a hypothesis tree to enable cumulative autonomous research, outperforming Codex and Claude Code by over 2.5x on six real tasks and reaching 86.36% Any Medal on MLE-Bench Lite.
FlashEvolve accelerates LLM agent self-evolution via asynchronous stage orchestration and inspectable language-space staleness handling, reporting 3.5-4.9x proposal throughput gains over synchronous baselines on GEPA workloads.
AutoPyVerifier learns compact sets of executable Python verifiers from labeled LLM outputs via LLM synthesis and DAG search, improving objective prediction by up to 55 F1 points and downstream LLM accuracy by up to 17 points.
BLF achieves state-of-the-art binary forecasting on ForecastBench by using linguistic belief states updated in tool-use loops, hierarchical multi-trial logit averaging, and hierarchical Platt scaling calibration.
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
Introduces HarnessMutation as a governed mechanism for lifecycle-aware runtime adaptation in agent systems, modeling evolution as a bounded observable process over persistent operational memory.
Agent Cybernetics reframes foundation agent design by adapting classical cybernetics laws into three engineering desiderata for reliable, long-running, self-improving agents.
citing papers explorer
-
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning
OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.