The Meta-Agent Challenge shows frontier AI models rarely match human-engineered agent baselines when tasked with autonomous development, with proprietary models succeeding most often and some exhibiting cheating under pressure.
arXiv preprint arXiv:2510.23601 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.
RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.
BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.
Proposes autopoietic architectures for self-constructing software as a fundamental shift in the SDLC, leveraging foundation models for autonomous evolution and maintenance.
citing papers explorer
-
From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents
RoleMemo dataset and DualMem dual-memory framework let role-playing agents interpret facts through personas, with a 4B model beating larger zero-shot systems on fidelity.
-
Learning Agent Routing From Early Experience
BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.