LLM-guided evolutionary search yields the first domain-independent C++ planning heuristics that exceed the strongest hand-engineered baselines on coverage and speed trade-offs across unseen domains.
hub Mixed citations
Illuminating search spaces by mapping elites
Mixed citation behavior. Most common role is background (64%).
abstract
Many fields use search algorithms, which automatically explore a search space to find high-performing solutions: chemists search through the space of molecules to discover new drugs; engineers search for stronger, cheaper, safer designs, scientists search for models that best explain data, etc. The goal of search algorithms has traditionally been to return the single highest-performing solution in a search space. Here we describe a new, fundamentally different type of algorithm that is more useful because it provides a holistic view of how high-performing solutions are distributed throughout a search space. It creates a map of high-performing solutions at each point in a space defined by dimensions of variation that a user gets to choose. This Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm illuminates search spaces, allowing researchers to understand how interesting attributes of solutions combine to affect performance, either positively or, equally of interest, negatively. For example, a drug company may wish to understand how performance changes as the size of molecules and their cost-to-produce vary. MAP-Elites produces a large diversity of high-performing, yet qualitatively different solutions, which can be more helpful than a single, high-performing solution. Interestingly, because MAP-Elites explores more of the search space, it also tends to find a better overall solution than state-of-the-art search algorithms. We demonstrate the benefits of this new algorithm in three different problem domains ranging from producing modular neural networks to designing simulated and real soft robots. Because MAP- Elites (1) illuminates the relationship between performance and dimensions of interest in solutions, (2) returns a set of high-performing, yet diverse solutions, and (3) improves finding a single, best solution, it will advance science and engineering.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
No continuous utility-preserving input wrapper can eliminate all prompt injection risks in connected prompt spaces for language models.
Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
Presents a query-complexity framework for genetic algorithms with guided operators and shows necessity of multiple operators and tight bounds for diversity in solution pools.
FML-Bench shows a simple greedy hill-climber nearly matches tree search on dense-opportunity tasks while an adaptive agent that broadens search on stagnation outperforms six baselines across 18 tasks.
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
PPol uses LLM-driven evolutionary program search to create diverse human-like user personas for simulators, yielding 33-62% fitness gains and +17% agent task success on retail and airline domains.
EvoPref applies NSGA-II evolutionary optimization with archive-based diversity to populations of LoRA adapters, yielding 18% higher preference coverage and 47% lower collapse than gradient descent baselines while matching alignment quality.
EvolveSignal applies LLM-driven evolutionary program synthesis to discover heuristic variations of traffic signal control logic that reduce delay and stops compared to Webster's method in simulation.
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
Phenotypic distance from output differences on fixed inputs enables surrogate models that predict performance of variable-topology neural networks as well as or better than weight-based models on fixed topologies in a robotic navigation task.
External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
LensAgent is a training-free LLM agent framework that reconstructs mass distributions in SLACS strong lensing systems to extract sub-galactic substructures.
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
Mastermind's dual-loop planner learns transferable strategies via SFT and milestone GRPO, raising GPT-5.5 executor pass rate on 200 held-out CyberGym tasks from 60% to 84.5%.
MFEA-CoD coordinates novelty search tasks with repulsion and adaptive transfer to collaboratively discover diverse novel solutions across synthetic, maze, MuJoCo, and generative problems.
Heuresis evaluates six search strategies for autonomous ML research agents and finds that novel ideas are rare, none rated original, and only one reaches top-10 quality while strategies steer axes but do not expand the quality-novelty frontier.
AIChilles finds 49 distinct hidden weaknesses across 30 AI-evolved programs in five applications by combining workload extraction, agent-based constraint inference, differential oracles, and coverage to expose regressions.
SRC is a fixed-horizon branch review framework for imitation learning in resettable web environments that collects 977 verifier-passing trajectories and 9,183 next-action examples while improving recovery-versus-query tradeoff over step-level review.
SV-QD-RL couples actor structure with branch-specific value learning via structure-conditioned actor-critic branches to generate diverse high-quality policy repertoires in QD-RL.
citing papers explorer
-
Explaining Attention with Program Synthesis
Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
-
FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics
FML-Bench shows a simple greedy hill-climber nearly matches tree search on dense-opportunity tasks while an adaptive agent that broadens search on stagnation outperforms six baselines across 18 tasks.
-
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
-
EvolveSignal: A Large Language Model Powered Coding Agent for Discovering Traffic Signal Control Strategies
EvolveSignal applies LLM-driven evolutionary program synthesis to discover heuristic variations of traffic signal control logic that reduce delay and stops compared to Webster's method in simulation.
-
Speculative Rollback Correction for Quality-Diverse Web Agent Imitation
SRC is a fixed-horizon branch review framework for imitation learning in resettable web environments that collects 977 verifier-passing trajectories and 9,183 next-action examples while improving recovery-versus-query tradeoff over step-level review.
-
DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
DEI shows a heterogeneous four-LLM ensemble achieving 124% higher QD-Score and 28% higher coverage than single-model baselines on Core War at equal compute budget.
-
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
ToolMol integrates evolutionary algorithms with agentic LLMs and precise RDKit tools to optimize multi-objective drug properties, yielding ligands with over 10% better predicted binding affinity and 35% gains in absolute binding free energy on three protein targets.
-
Space Syntax-guided Post-training for Residential Floor Plan Generation
SSPT turns space-syntax integration metrics into post-training feedback signals that improve public-space dominance and functional hierarchy in AI-generated residential floor plans.
-
JSON-Bag: A generic game trajectory representation
JSON-Bag tokenizes JSON game trajectories, applies Jensen-Shannon distance and prototype nearest-neighbor search to classify agents/parameters/seeds across six tabletop games, outperforming hand-crafted features and correlating with policy distances.
-
Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance
Stable-GFlowNet stabilizes GFN training for LLM red-teaming by eliminating Z estimation via pairwise comparisons and robust masking against noisy rewards while adding a fluency stabilizer.
-
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
RQGM enables co-evolution of agents and evaluators across epochs with non-stationary utilities, reporting gains in coding pass rates, paper acceptance, and proof grading over prior self-improving agents.
-
EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents
EEVEE introduces a router-based multi-dataset test-time prompt learning framework for LLM agents that uses router-prompt co-evolution to improve robustness on heterogeneous data streams.
-
Learning to Solve and Optimize by Evolving Code
CHECKMATE evolves correct high-performing solvers from formal specs and natural language descriptions, outperforming SOTA on configuration and scheduling problems.
-
Distributional Value Estimation Without Target Networks for Robust Quality-Diversity
QDHUAC is a distributional, target-free QD-RL method that enables stable high-UTD training and competitive performance on Brax locomotion tasks using far fewer environment steps than prior approaches.
-
A Compositional Framework for Open-ended Intelligence
Open-ended intelligence is formalized as the compositional closure L(P,C) of primitives P under operators C, with next primitive prediction proposed as an objective to acquire reusable primitives and grammar for lifelong adaptation.
- Multi-Task Optimization over Networks of Tasks
- Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning