pith. sign in

arxiv: 2603.23420 · v2 · pith:XV6AYR2Mnew · submitted 2026-03-24 · 💻 cs.AI

Bilevel Autoresearch: Meta-Autoresearching Itself

classification 💻 cs.AI
keywords loopautoresearchinnerouterbilevelmechanismsitselfsearch
0
0 comments X
read the original abstract

If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We present Bilevel Autoresearch, a bilevel framework in which an outer autoresearch loop improves an inner autoresearch loop by reading its code and traces, identifying bottlenecks, and generating injectable Python search mechanisms at runtime. The inner loop optimizes task performance; the outer loop optimizes how the inner loop searches. Both loops use the same LLM, so improvements come from the bilevel architecture rather than a stronger meta-level model, although the outer loop consumes additional inference and wall-clock budget. On Karpathy's GPT pretraining benchmark, the meta-autoresearch outer loop achieves a 5x improvement over the standard inner loop alone (-0.045 vs. -0.009 val_bpb), while parameter-level adjustment without mechanism change yields no reliable gain. The outer loop instantiates mechanisms from adjacent search domains, including combinatorial optimization, multi-armed bandits, and design of experiments, without human specification of the final mechanism design. Trace analysis suggests that these mechanisms break deterministic search patterns and force exploration of directions the LLM's priors avoid. The experiments demonstrate, on this benchmark, a first bilevel step: an outer loop improves the search behavior of an inner loop. Code is the mechanism carrier in this implementation, but skills, prompts, workflows, evaluators, domain principles, world-model assumptions, and memory schemas can also encode mechanisms that shape future agent behavior. This suggests a path toward recursive bootstrapping, where mechanisms discovered for the inner loop can be fed back to improve the meta-level loop itself.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. GEAR: Genetic AutoResearch for Agentic Code Evolution

    cs.NE 2026-05 unverdicted novelty 5.0

    GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.

  2. An Empirical Study of Multi-Agent Collaboration for Automated Research

    cs.MA 2026-03 unverdicted novelty 5.0

    Subagent architectures deliver stable high-throughput optimization under tight time limits while agent teams enable deeper refactoring at the cost of higher fragility.