DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
CoRR , volume =
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
A harness for AI agents enabled construction of a Rust library with 100+ problem types and 200+ reduction rules for NP-hard problems in three months.
AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
citing papers explorer
-
Deep Reasoning in General Purpose Agents via Structured Meta-Cognition
DOLORES, an agent using a formal language for meta-reasoning to construct adaptive scaffolds on the fly, outperforms prior scaffolding methods by 24.8% on average across four hard benchmarks and multiple model sizes.
-
Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems
A harness for AI agents enabled construction of a Rust library with 100+ problem types and 200+ reduction rules for NP-hard problems in three months.
-
AgentSPEX: An Agent SPecification and EXecution Language
AgentSPEX is a new language and harness for explicitly specifying and running structured LLM-agent workflows with typed steps, control flow, parallel execution, and a visual editor.
-
Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.