LLM-guided evolutionary search yields the first domain-independent C++ planning heuristics that exceed the strongest hand-engineered baselines on coverage and speed trade-offs across unseen domains.
hub Mixed citations
Illuminating search spaces by mapping elites
Mixed citation behavior. Most common role is background (64%).
abstract
Many fields use search algorithms, which automatically explore a search space to find high-performing solutions: chemists search through the space of molecules to discover new drugs; engineers search for stronger, cheaper, safer designs, scientists search for models that best explain data, etc. The goal of search algorithms has traditionally been to return the single highest-performing solution in a search space. Here we describe a new, fundamentally different type of algorithm that is more useful because it provides a holistic view of how high-performing solutions are distributed throughout a search space. It creates a map of high-performing solutions at each point in a space defined by dimensions of variation that a user gets to choose. This Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) algorithm illuminates search spaces, allowing researchers to understand how interesting attributes of solutions combine to affect performance, either positively or, equally of interest, negatively. For example, a drug company may wish to understand how performance changes as the size of molecules and their cost-to-produce vary. MAP-Elites produces a large diversity of high-performing, yet qualitatively different solutions, which can be more helpful than a single, high-performing solution. Interestingly, because MAP-Elites explores more of the search space, it also tends to find a better overall solution than state-of-the-art search algorithms. We demonstrate the benefits of this new algorithm in three different problem domains ranging from producing modular neural networks to designing simulated and real soft robots. Because MAP- Elites (1) illuminates the relationship between performance and dimensions of interest in solutions, (2) returns a set of high-performing, yet diverse solutions, and (3) improves finding a single, best solution, it will advance science and engineering.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
No continuous utility-preserving input wrapper can eliminate all prompt injection risks in connected prompt spaces for language models.
Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
FML-Bench shows a simple greedy hill-climber nearly matches tree search on dense-opportunity tasks while an adaptive agent that broadens search on stagnation outperforms six baselines across 18 tasks.
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
PPol uses LLM-driven evolutionary program search to create diverse human-like user personas for simulators, yielding 33-62% fitness gains and +17% agent task success on retail and airline domains.
EvoPref applies NSGA-II evolutionary optimization with archive-based diversity to populations of LoRA adapters, yielding 18% higher preference coverage and 47% lower collapse than gradient descent baselines while matching alignment quality.
EvolveSignal applies LLM-driven evolutionary program synthesis to discover heuristic variations of traffic signal control logic that reduce delay and stops compared to Webster's method in simulation.
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
Phenotypic distance from output differences on fixed inputs enables surrogate models that predict performance of variable-topology neural networks as well as or better than weight-based models on fixed topologies in a robotic navigation task.
External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
LensAgent is a training-free LLM agent framework that reconstructs mass distributions in SLACS strong lensing systems to extract sub-galactic substructures.
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
Heuresis evaluates six search strategies for autonomous ML research agents and finds that novel ideas are rare, none rated original, and only one reaches top-10 quality while strategies steer axes but do not expand the quality-novelty frontier.
SV-QD-RL couples actor structure with branch-specific value learning via structure-conditioned actor-critic branches to generate diverse high-quality policy repertoires in QD-RL.
U-Net surrogate enables offline MAP-Elites to achieve R²=0.996 on climate physics and ρ=0.994 fitness ranking using only random samples, while GP surrogates fail without active QD data.
Non-monotonic safety alignment appears in Gemma models, with Gemma 3 at 68.7% ASR versus 45.5% in Gemma 2 and 33.9% in Gemma 4 via MAP-Elites red-teaming and cross-generational attack transfer.
Applies MAP-Elites quality-diversity optimization to evolve semantic attack strategies across dimensions like strategy type, encoding, and length, uncovering distinct vulnerability profiles in four LLMs including GPT-4o-mini and Claude 3.5 Sonnet.
New Point-Line and Spatial-Layout map representations enable MAP-Elites to produce FPS maps with higher diversity and quality than prior All-Black and Grid-Graph methods.
DEI shows a heterogeneous four-LLM ensemble achieving 124% higher QD-Score and 28% higher coverage than single-model baselines on Core War at equal compute budget.
citing papers explorer
-
LLM-Evolved Domain-Independent Heuristics for Symbolic AI Planning
LLM-guided evolutionary search yields the first domain-independent C++ planning heuristics that exceed the strongest hand-engineered baselines on coverage and speed trade-offs across unseen domains.
-
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
-
The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?
No continuous utility-preserving input wrapper can eliminate all prompt injection risks in connected prompt spaces for language models.
-
Explaining Attention with Program Synthesis
Language-model-guided program synthesis can approximate transformer attention heads with over 75% IoU fidelity on held-out data and allow replacing 25% of heads with only 16% average perplexity increase.
-
FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics
FML-Bench shows a simple greedy hill-climber nearly matches tree search on dense-opportunity tasks while an adaptive agent that broadens search on stagnation outperforms six baselines across 18 tasks.
-
Diversified Residual Symbolic Regression
DRSR uses Quality-Diversity to produce diverse symbolic regression expressions differing in residual distributions, enabling post-search selection on synthetic and astronomical data.
-
FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
-
Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents
PPol uses LLM-driven evolutionary program search to create diverse human-like user personas for simulators, yielding 33-62% fitness gains and +17% agent task success on retail and airline domains.
-
EvoPref: Multi-Objective Evolutionary Optimization Discovers Diverse LLM Alignments Beyond Gradient Descent
EvoPref applies NSGA-II evolutionary optimization with archive-based diversity to populations of LoRA adapters, yielding 18% higher preference coverage and 47% lower collapse than gradient descent baselines while matching alignment quality.
-
EvolveSignal: A Large Language Model Powered Coding Agent for Discovering Traffic Signal Control Strategies
EvolveSignal applies LLM-driven evolutionary program synthesis to discover heuristic variations of traffic signal control logic that reduce delay and stops compared to Webster's method in simulation.
-
Automated Design of Agentic Systems
Meta Agent Search uses a meta-agent to iteratively program novel agentic systems in code, producing agents that outperform state-of-the-art hand-designed ones across coding, science, and math while transferring across domains and models.
-
EvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers
EvoPrompt uses LLMs to run evolutionary operators on populations of prompts, outperforming human-engineered prompts by up to 25% on BIG-Bench Hard tasks across 31 datasets.
-
Prediction of neural network performance by phenotypic modeling
Phenotypic distance from output differences on fixed inputs enables surrogate models that predict performance of variable-topology neural networks as well as or better than weight-based models on fixed topologies in a robotic navigation task.
-
Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design
External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.
-
Autopoiesis: A Self-Evolving System Paradigm for LLM Serving Under Runtime Dynamics
Autopoiesis uses LLM-driven program synthesis to evolve serving policies online during deployment, delivering up to 53% and average 34% gains over prior LLM serving systems under runtime dynamics.
-
LensAgent: A Self Evolving Agent for Autonomous Physical Inference of Sub-galactic Structure
LensAgent is a training-free LLM agent framework that reconstructs mass distributions in SLACS strong lensing systems to extract sub-galactic substructures.
-
AlphaEvolve: A coding agent for scientific and algorithmic discovery
AlphaEvolve is an LLM-orchestrated evolutionary coding agent that discovered a 4x4 complex matrix multiplication algorithm using 48 scalar multiplications, the first improvement over Strassen's algorithm in 56 years, plus optimizations for Google data centers and hardware.
-
Heuresis: Search Strategies for Autonomous AI Research Agents Across Quality, Diversity and Novelty
Heuresis evaluates six search strategies for autonomous ML research agents and finds that novel ideas are rare, none rated original, and only one reaches top-10 quality while strategies steer axes but do not expand the quality-novelty frontier.
-
Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning
SV-QD-RL couples actor structure with branch-specific value learning via structure-conditioned actor-critic branches to generate diverse high-quality policy repertoires in QD-RL.
-
U-Net-Accelerated Quality-Diversity Optimization for Climate-Adaptive Urban Layouts
U-Net surrogate enables offline MAP-Elites to achieve R²=0.996 on climate physics and ρ=0.994 fitness ranking using only random samples, while GP surrogates fail without active QD data.
-
Cross-Generational Transfer of Adversarial Attacks Reveals Non-Monotonic Safety Alignment in LLMs
Non-monotonic safety alignment appears in Gemma models, with Gemma 3 at 68.7% ASR versus 45.5% in Gemma 2 and 33.9% in Gemma 4 via MAP-Elites red-teaming and cross-generational attack transfer.
-
Quality-Diversity Evolution for Discovering Diverse Vulnerabilities in LLM Safety
Applies MAP-Elites quality-diversity optimization to evolve semantic attack strategies across dimensions like strategy type, encoding, and length, uncovering distinct vulnerability profiles in four LLMs including GPT-4o-mini and Claude 3.5 Sonnet.
-
Procedural Generation of First Person Shooter Maps using Map-Elites
New Point-Line and Spatial-Layout map representations enable MAP-Elites to produce FPS maps with higher diversity and quality than prior All-Black and Grid-Graph methods.
-
DEI: Diversity in Evolutionary Inference for Quality-Diversity Search
DEI shows a heterogeneous four-LLM ensemble achieving 124% higher QD-Score and 28% higher coverage than single-model baselines on Core War at equal compute budget.
-
Constitutional Arms Races in the Public Goods Game: Co-Evolving LLM Constitutions Under Cooperation-Defection Pressure
Adversarial co-evolution of LLM constitutions in public goods games reaches near-parity equilibrium only when fitness is coupled across factions and evaluation uses at least five seeds per generation.
-
optimize_anything: A Universal API for Optimizing any Text Parameter
A universal LLM optimizer for text artifacts achieves SOTA results on six tasks including tripling ARC-AGI accuracy and cutting cloud costs by 40% via cross-task transfer and side information.
-
PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play
PopuLoRA shows that co-evolving populations of LoRA adapters through cross-evaluated self-play can outperform compute-matched single-agent baselines on multiple code and math reasoning benchmarks.
-
ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery
ToolMol integrates evolutionary algorithms with agentic LLMs and precise RDKit tools to optimize multi-objective drug properties, yielding ligands with over 10% better predicted binding affinity and 35% gains in absolute binding free energy on three protein targets.
-
Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution
QD-LLM applies neuroevolution to prompt embeddings within a quality-diversity framework, producing 46% higher coverage and 41% higher QD-score than QDAIF on HumanEval, MBPP, and creative writing benchmarks.
-
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
Kernel-Smith combines evolutionary search with RL post-training to generate optimized GPU kernels, achieving SOTA speedups on KernelBench that beat Gemini-3.0-pro and Claude-4.6-opus on NVIDIA Triton and generalize to MetaX MACA.
-
Red-Teaming Vision-Language-Action Models via Quality Diversity Prompt Generation for Robust Robot Policies
Q-DIG applies quality diversity optimization with vision-language models to generate diverse adversarial instructions that reveal VLA robot failures and enable robustness improvements via fine-tuning.
-
Space Syntax-guided Post-training for Residential Floor Plan Generation
SSPT turns space-syntax integration metrics into post-training feedback signals that improve public-space dominance and functional hierarchy in AI-generated residential floor plans.
-
Diversifying Toxicity Search in Large Language Models Through Speciation
ToxSearch-S applies unsupervised speciation to evolutionary prompt search, maintaining capacity-limited species with exemplar leaders and species-aware selection to achieve higher peak toxicity and broader semantic coverage than standard methods.
-
Tournament Informed Adversarial Quality Diversity
Tournament-informed task selection in adversarial QD produces higher quality and diversity in coevolved solutions across Pong, cat-and-mouse, and pursuers-evaders games.
-
Motif Diversity in Human Liver ChIP-seq Data Using MAP-Elites
MAP-Elites recovers multiple high-quality motif variants from CTCF ChIP-seq data with fitness comparable to MEME while revealing structured diversity.
-
JSON-Bag: A generic game trajectory representation
JSON-Bag tokenizes JSON game trajectories, applies Jensen-Shannon distance and prototype nearest-neighbor search to classify agents/parameters/seeds across six tabletop games, outperforming hand-crafted features and correlating with policy distances.
-
Adversarial Coevolutionary Illumination with Generational Adversarial MAP-Elites
GAME is a new adversarial coevolutionary QD algorithm using generational alternation and vision embeddings that outperforms one-sided baselines across battle, wrestling, and deck-building tasks while revealing arms-race dynamics and the role of neutral mutations.
-
Automatic Calibration of Artificial Neural Networks for Zebrafish Collective Behaviours using a Quality Diversity Algorithm
CVT-MAP-Elites quality diversity search calibrates ANN-based agent models of zebrafish collective motion to outperform standard evolutionary methods on both macroscopic group metrics and microscopic individual realism.
-
Diverse Agents for Ad-Hoc Cooperation in Hanabi
Quality Diversity algorithms are proposed to generate diverse agent populations for ad-hoc cooperation evaluation in Hanabi, with discussion of metrics and adaptive agent building.
-
HMACE: Heterogeneous Multi-Agent Collaborative Evolution for Combinatorial Optimization
HMACE deploys Proposer, Generator, Evaluator, and Reflector agents in an evolutionary loop to generate and refine heuristics for NP-hard problems, reporting lower optimality gaps and token costs than baselines on TSP and Online BPP.
-
Stable-GFlowNet: Toward Diverse and Robust LLM Red-Teaming via Contrastive Trajectory Balance
Stable-GFlowNet stabilizes GFN training for LLM red-teaming by eliminating Z estimation via pairwise comparisons and robust masking against noisy rewards while adding a fluency stabilizer.
-
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
-
QDTraj: Exploration of Diverse Trajectory Primitives for Articulated Objects Robotic Manipulation
QDTraj uses Quality-Diversity algorithms with sparse rewards to produce at least five times more diverse high-performing trajectories for articulated object manipulation than compared methods, validated across 30 objects with hundreds of trajectories per task.
-
LLM-Guided Prompt Evolution for Password Guessing
LLM-guided evolutionary prompt optimization using MAP-Elites and island models raises password cracking rates from 2.02% to 8.48% on a RockYou-derived test set across local, cloud, and ensemble LLM setups.
-
TurboEvolve: Towards Fast and Robust LLM-Driven Program Evolution
TurboEvolve improves LLM program evolution by running parallel islands with LLM-generated diverse candidates that carry self-assigned weights, an adaptive scheduler, and clustered seed injection to reach stronger solutions at lower evaluation budgets.
-
Uncovering Linguistic Fragility in Vision-Language-Action Models via Diversity-Aware Red Teaming
DAERT generates diverse adversarial instructions via a uniform policy in RL to drop VLA task success rates from 93.33% to 5.85% on benchmarks with models like π0 and OpenVLA.
-
Self-Evolving Agents with Anytime-Valid Certificates
SEA architecture gates self-modifications via anytime-valid certificates on a frozen base model plus five verifier mechanisms, yielding +4 to +5 gains on a SWE-bench subset for two strong bases.
-
TacEvo: Self-Evolving Architecture Discovery for Robotic Tactile Perception via LLM-Driven Quality-Diversity Search
TacEvo is an LLM-driven self-evolving search method that discovers neural architectures for robotic tactile force regression and grating classification, reporting fitness gains of 56.1% and 96.1% over 20 generations.
-
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators
RQGM enables co-evolution of agents and evaluators across epochs with non-stationary utilities, reporting gains in coding pass rates, paper acceptance, and proof grading over prior self-improving agents.
-
Quality-Diversity Search in Sound Generation: Investigating Innovation Engines for Audio Exploration
MAP-Elites with CPPNs, DSP graphs, and a deep classifier produces diverse synthetic sounds across durations and musical/non-musical contexts.