arxiv: 2605.09781 · v1 · submitted 2026-05-10 · 💻 cs.NE · cs.AI· cs.CL· cs.LG

Recognition: no theorem link

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Pith reviewed 2026-05-12 03:34 UTC · model grok-4.3

classification 💻 cs.NE cs.AIcs.CLcs.LG

keywords quality-diversity optimizationprompt embeddingsneuroevolutionlarge language modelsmode collapseparameter-efficient adaptationbehavior characterizationdiverse generation

0 comments

The pith

Evolving compact prompt embeddings inside frozen large language models produces more diverse outputs than standard methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that quality-diversity optimization can be applied directly to prompt embeddings rather than full model weights, allowing large language models to explore wider ranges of valid solutions without retraining. A sympathetic reader would care because current models often repeat similar answers and miss valid alternatives in coding or writing tasks. The approach keeps the main model frozen while evolving a small interface of about 32,000 parameters. It combines semantic and explicit behavior measures that are nearly independent, supporting formal guarantees on how much of the solution space gets covered. Experiments on coding benchmarks and creative writing show the method yields archives with substantially more distinct high-quality outputs than prior quality-diversity baselines for language models.

Core claim

QD-LLM evolves prompt embeddings via gradient-free quality-diversity optimization to steer frozen LLMs, using hybrid semantic-plus-explicit behavior descriptors that satisfy near-independence conditions and yield formal coverage bounds. On HumanEval, MBPP, and creative writing tasks this produces higher coverage of the solution space and higher QD-scores than previous approaches, with the resulting diverse archives also improving downstream test generation and fine-tuning data quality.

What carries the argument

prompt embedding evolution inside a quality-diversity optimization loop that uses hybrid behavior descriptors and co-evolutionary variation operators

If this is right

Diverse solution archives generated by the method produce 34 percent more edge cases when used for test generation.
Data selected from the archives yields an 8.3 percent accuracy gain when used to fine-tune models.
The same embedding-evolution process works across multiple open-source LLMs with full embedding access.
Formal coverage bounds hold under the observed low normalized mutual information between descriptor types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compact-interface idea could be tested on tasks outside coding and writing, such as dialogue or planning, to check whether the diversity gains transfer.
If prompt embeddings prove controllable across model sizes, future work might combine this approach with other parameter-efficient methods to reduce the cost of maintaining diverse model behaviors.
The near-independence of descriptors suggests it may be possible to add more behavior axes without losing the coverage guarantees, provided the new measures also show low mutual information.

Load-bearing premise

The hybrid semantic and explicit behavior descriptors stay sufficiently independent to support the formal coverage bounds, and small prompt embeddings can reliably steer much larger frozen models across the tested tasks.

What would settle it

A direct measurement showing that the hybrid descriptors are strongly dependent, or a run on the same benchmarks where coverage and QD-score fail to exceed the QDAIF baseline by a statistically significant margin.

Figures

Figures reproduced from arXiv: 2605.09781 by Dongxin Guo, Jikun Wu, Siu Ming Yiu.

**Figure 1.** Figure 1: Overview of QD-LLM. Each archive cell stores a (text, prompt embedding) pair. Co-evolutionary operators jointly mutate soft prompt embeddings p and output text 𝑡. Hybrid BC combines semantic and explicit features. that feedback-based optimization can effectively shape LLM behavior. Mechanistic understanding of how context influences LLM outputs [1] provides theoretical grounding for why evolving prompt e… view at source ↗

**Figure 2.** Figure 2: Archive coverage dynamics over evaluations (Hu [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Diverse solutions from QD-LLM archive for list intersection. Each occupies a distinct behavioral cell based on paradigm (iterative/recursive/functional/library), demonstrating meaningful algorithmic diversity beyond surface variation. Fine-tuning Data Quality. We fine-tuned CodeLlama-7B on diverse solutions from archives (500 solutions each) using standard supervised fine-tuning with learning rate 2×10−5 f… view at source ↗

read the original abstract

Large Language Models exhibit mode collapse, producing homogeneous outputs that fail to explore valid solution spaces. We present QD-LLM, a framework for parameter-efficient neuroevolution that evolves prompt embeddings, compact neural interfaces (~32K parameters) that steer generation in frozen LLMs (70B+ parameters), within a Quality-Diversity (QD) optimization framework. Our contributions: (1) evolved prompt embeddings via gradient-free optimization enabling behavioral steering without model fine-tuning; (2) hybrid behavior characterization combining semantic and explicit features with formal coverage bounds (Theorem 1) under validated near-independence (NMI $= 0.08 \pm 0.02$); (3) co-evolutionary variation operators including targeted behavioral mutation via finite-difference gradient estimation. On HumanEval (164 problems), MBPP, and creative writing benchmarks, QD-LLM achieves 46.4% higher coverage and 41.4% higher QD-Score than QDAIF ($p<0.001$, 30 runs, Vargha-Delaney $A=0.94$). We demonstrate downstream utility: diverse archives improve test generation (34% more edge cases) and fine-tuning data quality (8.3% accuracy gain). We validate across open-source LLMs (Llama-3-70B, Mistral-Large) with full embedding access, establishing prompt embedding evolution as an effective paradigm bridging neuroevolution and modern LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QD-LLM shows real gains in output diversity by evolving small prompt embeddings in a QD loop, but the formal coverage theorem probably doesn't apply to the actual archives they produce.

read the letter

QD-LLM shows real gains in output diversity by evolving small prompt embeddings in a QD loop, but the formal coverage theorem probably doesn't apply to the actual archives they produce. The new part is using gradient-free neuroevolution on ~32k parameter prompt embeddings to control frozen 70B+ LLMs, combined with hybrid semantic-explicit descriptors that they check for independence via NMI. They get 46% better coverage and 41% better QD-score than QDAIF on the benchmarks, plus some useful downstream results on test case generation and fine-tuning data. That empirical side looks decent with the multiple runs and effect sizes. The hybrid descriptor idea is a reasonable way to get both coverage and quality. The weak point is the Theorem 1 bound. It requires the two descriptor types to be independent, which they support with one global NMI value. But as the archive evolves, there could be local dependencies in the regions that matter, so the bound may not cover the reported numbers. Without archive-specific checks, you can't separate the contribution of the hybrid setup from other factors like the mutation method. No issues with circularity or invented stuff here. The approach is grounded in existing QD and prompt tuning work. This paper is for researchers interested in making LLMs produce more varied outputs without retraining. It has enough substance for a serious referee to look at, especially to verify the formal claim against the actual data. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper proposes QD-LLM, a parameter-efficient neuroevolution framework that evolves compact prompt embeddings (~32K parameters) to steer frozen LLMs (70B+) within a Quality-Diversity optimization loop. It introduces hybrid semantic-explicit behavior descriptors supported by a formal coverage theorem (Theorem 1) under low NMI, co-evolutionary variation operators, and reports 46.4% higher coverage and 41.4% higher QD-Score than QDAIF on HumanEval, MBPP, and creative writing benchmarks, with downstream gains in test generation and fine-tuning data quality.

Significance. If the empirical gains and formal bounds hold, the work offers a scalable way to mitigate mode collapse in LLMs by evolving a small number of steerable parameters rather than fine-tuning the full model. The reported effect sizes, multiple-run statistics, cross-model validation on Llama-3-70B and Mistral-Large, and downstream utility demonstrations constitute a concrete advance at the intersection of neuroevolution and modern generative models.

major comments (2)

[Theorem 1] Theorem 1: The coverage bounds rest on the assumption that semantic and explicit behavior descriptors remain near-independent, validated only by a single global NMI value of 0.08 ± 0.02. Because NMI is an aggregate scalar, it does not rule out localized correlations that may arise in the specific regions of behavior space populated by the evolved prompt-embedding archive; without an archive-specific or subspace-localized independence check, the formal guarantee does not necessarily apply to the observed coverage numbers.
[Experimental Results] Experimental protocol: The headline performance claims (46.4% coverage gain, p<0.001, 30 runs, Vargha-Delaney A=0.94) are presented without the full experimental protocol, hyperparameter search details, ablation studies isolating the hybrid descriptors from the finite-difference mutation operators, or exact embedding dimension used. This absence prevents independent verification that the gains are robust and not attributable to post-hoc choices or unequal evaluation budgets.

minor comments (2)

Replace the approximate description '~32K parameters' with the exact prompt-embedding dimension employed in all reported experiments.
[Abstract] Clarify whether the same embedding dimension and architecture were used for both Llama-3-70B and Mistral-Large, and state the precise access requirements for the embedding interface.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments on Theorem 1 and the experimental protocol raise important points about the strength of our formal guarantees and reproducibility. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Theorem 1] Theorem 1: The coverage bounds rest on the assumption that semantic and explicit behavior descriptors remain near-independent, validated only by a single global NMI value of 0.08 ± 0.02. Because NMI is an aggregate scalar, it does not rule out localized correlations that may arise in the specific regions of behavior space populated by the evolved prompt-embedding archive; without an archive-specific or subspace-localized independence check, the formal guarantee does not necessarily apply to the observed coverage numbers.

Authors: We agree that a global NMI statistic, while indicative of overall low dependence, does not by itself confirm independence within the specific subspaces occupied by the QD archive. To strengthen the link between Theorem 1 and the reported coverage results, we will add an archive-specific analysis in the revised manuscript. This will include (i) partitioning the behavior space into the cells actually populated by the final archive and (ii) computing NMI within each populated cell (or small groups of adjacent cells) using the same descriptor pairs. We expect these localized values to remain low, thereby providing direct empirical support for the applicability of the coverage bound to the observed archives. The additional figures and tables will be placed in the main text or supplementary material as appropriate. revision: yes
Referee: [Experimental Results] Experimental protocol: The headline performance claims (46.4% coverage gain, p<0.001, 30 runs, Vargha-Delaney A=0.94) are presented without the full experimental protocol, hyperparameter search details, ablation studies isolating the hybrid descriptors from the finite-difference mutation operators, or exact embedding dimension used. This absence prevents independent verification that the gains are robust and not attributable to post-hoc choices or unequal evaluation budgets.

Authors: We acknowledge that the original submission omitted several details necessary for full reproducibility. In the revised manuscript we will expand the experimental section and add a dedicated reproducibility appendix containing: (1) the complete hyperparameter search procedure, ranges, and final selected values for all methods; (2) ablation experiments that separately disable the hybrid descriptor combination and the finite-difference mutation operator while keeping all other factors fixed; (3) the precise embedding dimension (32 768 parameters) together with the prompt template and tokenization details; and (4) explicit confirmation that all compared algorithms were allocated identical evaluation budgets (number of LLM calls). These additions will allow independent verification that the reported gains are attributable to the proposed components rather than unequal resources or post-hoc tuning. revision: yes

Circularity Check

0 steps flagged

No circularity: results are empirical benchmark comparisons with separate NMI validation for Theorem 1

full rationale

The paper reports performance via direct experiments on HumanEval, MBPP, and creative-writing benchmarks, giving 46.4% coverage and 41.4% QD-Score gains versus QDAIF with p-values and effect sizes from 30 runs. Theorem 1 supplies coverage bounds only under the separately measured assumption of descriptor near-independence (NMI = 0.08 ± 0.02); this NMI is an empirical scalar computed on the data, not a quantity defined by the coverage metric itself. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citation chain is invoked to justify uniqueness or an ansatz, and the hybrid-descriptor construction is presented as an input choice whose independence is externally checked rather than tautological. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unproven transferability of small prompt embeddings to control 70B-scale frozen models and on the near-independence assumption used for coverage bounds; no new physical entities are postulated.

free parameters (1)

prompt embedding dimension
Compact neural interface of approximately 32K parameters chosen to balance steering power and optimization cost.

axioms (1)

domain assumption Hybrid semantic and explicit features are near-independent (NMI = 0.08 ± 0.02)
Invoked to justify formal coverage bounds in Theorem 1.

invented entities (1)

prompt embedding as steerable neural interface no independent evidence
purpose: Compact interface that modulates frozen LLM behavior without weight updates
Introduced as the evolvable object in the QD archive; no independent falsifiable prediction outside the reported benchmarks is given.

pith-pipeline@v0.9.0 · 5571 in / 1457 out tokens · 42561 ms · 2026-05-12T03:34:01.371350+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

[1]

Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou

work page
[2]

InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023

What learning algorithm is in-context learning? Investigations with lin- ear models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net

work page 2023
[3]

Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Parameter-Efficient Neuroevolution for Diverse LLM Generation GECCO ’26, July 13–17, 2026, San Jose, Costa Rica Charles Sutton. 2021. Program Synthesis with Large Language Models.arXiv preprintarXiv.2108.07732 (20...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[4]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernan- dez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse,...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[5]

Stanley, Grégory Schott, and Joel Lehman

Herbie Bradley, Andrew Dai, Hannah Benita Teufel, Jenny Zhang, Koen Oost- ermeijer, Marco Bellagente, Jeff Clune, Kenneth O. Stanley, Grégory Schott, and Joel Lehman. 2024. Quality-Diversity through AI Feedback. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net

work page 2024
[6]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

work page 2020
[7]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[8]

Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scal- ing MAP-Elites to deep neuroevolution. InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.). ACM, 67–75. doi:10.1145/3377930.3390217

work page doi:10.1145/3377930.3390217 2020
[9]

Antoine Cully. 2019. Autonomous skill discovery with quality-diversity and unsupervised descriptors. InProceedings of the Genetic and Evolutionary Compu- tation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019, Anne Auger and Thomas Stützle (Eds.). ACM, 81–89. doi:10.1145/3321707.3321804

work page doi:10.1145/3321707.3321804 2019
[10]

Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals.Nat.521, 7553 (2015), 503–507. doi:10.1038/ NATURE14422

work page 2015
[11]

Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework.IEEE Trans. Evol. Comput.22, 2 (2018), 245–259. doi:10.1109/TEVC.2017.2704781

work page doi:10.1109/tevc.2017.2704781 2018
[12]

Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. InAdvances in Neural Informa- tion Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenk...

work page 2023
[13]

Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, and Joel Lehman. 2024. Quality Di- versity through Human Feedback: Towards Open-Ended Diversity-Driven Opti- mization. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 (Proceedings of Machine Learning Research), Rus- lan Salakhutdinov, Zico Kolter, Kathe...

work page 2024
[14]

Maxence Faldor, Félix Chalumeau, Manon Flageat, and Antoine Cully. 2023. MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 138–146. doi:10.1145/3583131.3590503

work page doi:10.1145/3583131.3590503 2023
[15]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL), Trevor Cohn, Yulan He,...

work page doi:10.18653/v1/2020 2020
[16]

Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rocktäschel. 2024. Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution. InForty-first International Conference on Machine Learn- ing, ICML 2024, Vienna, Austria, July 21-27, 2024 (Proceedings of Machine Learning Research), Ruslan Salakhutdinov, Zico Kolter, K...

work page 2024
[17]

Manon Flageat and Antoine Cully. 2024. Uncertain Quality-Diversity: Evalu- ation Methodology and New Methods for Quality-Diversity in Uncertain Do- mains.IEEE Trans. Evol. Comput.28, 4 (2024), 891–902. doi:10.1109/TEVC.2023. 3273560

work page doi:10.1109/tevc.2023 2024
[18]

Manon Flageat, Johann Huber, François Hélénon, Stéphane Doncieux, and An- toine Cully. 2025. Extract-QD Framework: A Generic Approach for Quality- Diversity in Noisy, Stochastic or Uncertain Domains. InProceedings of the Ge- netic and Evolutionary Computation Conference, GECCO 2025, NH Malaga Hotel, Malaga, Spain, July 14-18, 2025, Bogdan Filipic (Ed.). A...

work page arXiv 2025
[19]

Fontaine and Stefanos Nikolaidis

Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Di- versity. InAdvances in Neural Information Processing Systems 34: Annual Confer- ence on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Ed...

work page 2021
[20]

Fontaine and Stefanos Nikolaidis

Matthew C. Fontaine and Stefanos Nikolaidis. 2023. Covariance Matrix Adapta- tion MAP-Annealing. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 456–465. doi:10.1145/3583131.3590389

work page doi:10.1145/3583131.3590389 2023
[21]

Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K

Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K. Hoover

work page
[22]

InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.)

Covariance matrix adaptation for the rapid illumination of behavior space. InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.). ACM, 94–102. doi:10.1145/ 3377930.3390232

work page arXiv 2020
[23]

Adam Gaier and David Ha. 2019. Weight Agnostic Neural Networks. InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Infor- mation Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds...

work page 2019
[25]

Yannakakis

Daniele Gravina, Ahmed Khalifa, Antonios Liapis, Julian Togelius, and Geor- gios N. Yannakakis. 2019. Procedural Content Generation through Quality Di- versity. InIEEE Conference on Games, CoG 2019, London, United Kingdom, August 20-23, 2019. IEEE, 1–8. doi:10.1109/CIG.2019.8848053

work page doi:10.1109/cig.2019.8848053 2019
[26]

Luca Grillotti and Antoine Cully. 2022. Unsupervised Behavior Discovery With Quality-Diversity Optimization.IEEE Trans. Evol. Comput.26, 6 (2022), 1539–

work page 2022
[27]

doi:10.1109/TEVC.2022.3159855

work page doi:10.1109/tevc.2022.3159855 2022
[28]

Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2024. Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net

work page 2024
[29]

Nikolaus Hansen. 2023. The CMA Evolution Strategy: A Tutorial.arXiv preprint arXiv.1604.00772 (2023). https://arxiv.org/abs/1604.00772

work page arXiv 2023
[30]

Leo Apostel

Francis Heylighen and Jean-Marc Dewaele. 1999. Formality of language: defi- nition, measurement and behavioral determinants.Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brüssel4, 1 (1999)

work page 1999
[31]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curi- ous Case of Neural Text Degeneration. In8th International Conference on Learn- ing Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenRe- view.net

work page 2020
[32]

Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. InProceedings of the 36th Inter- national Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learnin...

work page 2019
[33]

Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large GECCO ’26, July 13–17, 2026, San Jose, Costa Rica D. Guo, J. Wu, and S. M. Yiu Language Models. InThe Tenth International Conference on Learning Representa- tions, ICLR 2022, Virtual Event, April 25-29, 20...

work page 2022
[34]

2019.𝜀-Entropy and𝜀-Capacity of Sets in Functional Spaces (Excerpt)

AN Kolmogorov and VM Tihomirov. 2019.𝜀-Entropy and𝜀-Capacity of Sets in Functional Spaces (Excerpt). InClassics On Fractals. CRC Press, 298–339

work page 2019
[35]

Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information.Physical Review E69, 6 (jun 2004). doi:10.1103/physreve.69. 066138

work page doi:10.1103/physreve.69 2004
[36]

Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (...

work page doi:10.18653/v1/2021.emnlp- 2021
[37]

Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. InAdvances in Neural Informa- tion Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Ben- gio, Hanna M. Wallach, Hugo Larochelle, Kristen Gr...

work page 2018
[38]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Ko- cetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- haene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, ...

work page 2023
[39]

Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2023. Contrastive Decoding: Open-ended Text Generation as Optimization. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A...

work page doi:10.18653/v1/2023.acl-long.687 2023
[40]

Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Pa- pers), Virtual Event, August 1-6, 2021, Chengqing Zo...

work page doi:10.18653/v1/2021.acl-long.353 2021
[41]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...

work page 2022
[42]

2022), 1092–1097

arXiv:https://www.science.org/doi/pdf/10.1126/science.abq1158 doi:10. 1126/science.abq1158

work page doi:10.1126/science.abq1158
[43]

Vladislav Lialin, Vijeta Deshpande, Xiaowei Yao, and Anna Rumshisky. 2024. Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning.arXiv preprintarXiv.2303.15647 (2024). https://arxiv.org/abs/2303.15647

work page arXiv 2024
[44]

Hong Seo Lim and Peng Qiu. 2023. Quantifying Cell-Type-Specific Differences of Single-Cell Datasets Using Uniform Manifold Approximation and Projection for Dimension Reduction and Shapley Additive exPlanations.J. Comput. Biol. 30, 7 (2023), 738–750. doi:10.1089/CMB.2022.0366

work page doi:10.1089/cmb.2022.0366 2023
[45]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Pres...

work page 2022
[46]

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambro- sio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Li- dong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Un...

work page 2021
[47]

Nelson, Herbie Bradley, Adam Gaier, Arash Moradi Karkaj, Amy K

Elliot Meyerson, Mark J. Nelson, Herbie Bradley, Adam Gaier, Arash Moradi Karkaj, Amy K. Hoover, and Joel Lehman. 2024. Language Model Crossover: Variation through Few-Shot Prompting.ACM Trans. Evol. Learn. Optim.4, 4 (2024), 27:1–27:40. doi:10.1145/3694791

work page doi:10.1145/3694791 2024
[48]

Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James F. Allen. 2016. A Cor- pus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. InNAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tec...

work page doi:10.18653/v1/n16-1098 2016
[49]

Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by map- ping elites.arXiv preprintarXiv.1504.04909 (2015). https://arxiv.org/abs/1504. 04909

work page arXiv 2015
[50]

Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted MAP-Elites. In GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021, Francisco Chicano and Krzysztof Krawiec (Eds.). ACM, 866–875. doi:10.1145/3449639.3459304

work page doi:10.1145/3449639.3459304 2021
[51]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe

work page
[52]

Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neu- ral Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, No- vember 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.)

work page 2022
[53]

Thomas Pierrot, Guillaume Richard, Karim Beguir, and Antoine Cully. 2022. Multi-objective quality diversity optimization. InGECCO ’22: Genetic and Evo- lutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, Jonathan E. Fieldsend and Markus Wagner (Eds.). ACM, 139–147. doi:10.1145/ 3512290.3528823

work page arXiv 2022
[54]

K., Soros, L

Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation.Frontiers Robotics AI3 (2016), 40. doi:10.3389/FROBT.2016.00040

work page doi:10.3389/frobt.2016.00040 2016
[55]

Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th International Joint Con- ference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vin...

work page doi:10.18653/v1/d19- 2019
[56]

Pawan Kumar, Emilien Dupont, Francisco J

Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi

work page
[57]

Pawan Kumar, Emilien Dupont, Francisco J

Mathematical discoveries from program search with large language mod- els.Nat.625, 7995 (2024), 468–475. doi:10.1038/S41586-023-06924-6

work page doi:10.1038/s41586-023-06924-6 2024
[58]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning.arXiv preprintarXiv.1703.03864 (2017). https://arxiv.org/abs/1703.03864

work page arXiv 2017
[59]

Stanley, David B

Kenneth O. Stanley, David B. D’Ambrosio, and Jason Gauci. 2009. A Hypercube- Based Encoding for Evolving Large-Scale Neural Networks.Artif. Life15, 2 (2009), 185–212. doi:10.1162/ARTL.2009.15.2.15202

work page doi:10.1162/artl.2009.15.2.15202 2009
[60]

Stanley and Risto Miikkulainen

Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks through Augmenting Topologies.Evolutionary Computation10, 2 (jun 2002), 99–127. doi:10.1162/106365602320169811

work page doi:10.1162/106365602320169811 2002
[61]

Stanley, and Jeff Clune

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Ken- neth O. Stanley, and Jeff Clune. 2018. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforce- ment Learning.arXiv preprintarXiv.1712.06567 (2018). https://arxiv.org/abs/ 1712.06567

work page arXiv 2018
[62]

Delaney, and Andras Vargha

András Vargha, Harold D. Delaney, and Andras Vargha. 2000. A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101. doi:10.2307/1165329

work page doi:10.2307/1165329 2000
[63]

Chatzilygeroudis, and Jean-Baptiste Mouret

Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret

work page
[64]

Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm.IEEE Trans. Evol. Comput.22, 4 (2018), 623–630. doi:10.1109/TEVC.2017.2735550

work page doi:10.1109/tevc.2017.2735550 2018
[65]

Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra

Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. 2018. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models.arXiv preprint arXiv.1610.02424 (2018). https://arxiv.org/abs/1610.02424 Parameter-Efficient Neuroevolution for Diverse LLM Generation GECCO ’26, July 13–1...

work page arXiv 2018
[66]

Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, and Qiang Fu

work page
[67]

InProceedings of the Thirty-Second Inter- national Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China

Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting. InProceedings of the Thirty-Second Inter- national Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 4335–4343. doi:10.24963/IJCAI.2023/482

work page doi:10.24963/ijcai.2023/482 2023
[68]

Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representa- tion Learning through Alignment and Uniformity on the Hypersphere. InPro- ceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research). PMLR, 9929–9939

work page 2020
[69]

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,

Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Under- standing and Generation. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Fran...

work page doi:10.18653/v1/2021.emnlp-main.685 2021
[70]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Sys- tems 2022, NeurIPS 2022, New Orleans, LA, USA, Nov...

work page 2022
[71]

Daan Wierstra, Tom Schaul, Jan Peters, and Jürgen Schmidhuber. 2008. Nat- ural Evolution Strategies. InProceedings of the IEEE Congress on Evolutionary Computation, CEC 2008, June 1-6, 2008, Hong Kong, China. IEEE, 3381–3387. doi:10.1109/CEC.2008.4631255

work page doi:10.1109/cec.2008.4631255 2008
[72]

Xing, Hao Zhang, Joseph E

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT- Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems...

work page 2023