pith. machine review for the scientific record. sign in

arxiv: 2605.09781 · v1 · submitted 2026-05-10 · 💻 cs.NE · cs.AI· cs.CL· cs.LG

Recognition: no theorem link

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

Dongxin Guo, Jikun Wu, Siu Ming Yiu

Pith reviewed 2026-05-12 03:34 UTC · model grok-4.3

classification 💻 cs.NE cs.AIcs.CLcs.LG
keywords quality-diversity optimizationprompt embeddingsneuroevolutionlarge language modelsmode collapseparameter-efficient adaptationbehavior characterizationdiverse generation
0
0 comments X

The pith

Evolving compact prompt embeddings inside frozen large language models produces more diverse outputs than standard methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that quality-diversity optimization can be applied directly to prompt embeddings rather than full model weights, allowing large language models to explore wider ranges of valid solutions without retraining. A sympathetic reader would care because current models often repeat similar answers and miss valid alternatives in coding or writing tasks. The approach keeps the main model frozen while evolving a small interface of about 32,000 parameters. It combines semantic and explicit behavior measures that are nearly independent, supporting formal guarantees on how much of the solution space gets covered. Experiments on coding benchmarks and creative writing show the method yields archives with substantially more distinct high-quality outputs than prior quality-diversity baselines for language models.

Core claim

QD-LLM evolves prompt embeddings via gradient-free quality-diversity optimization to steer frozen LLMs, using hybrid semantic-plus-explicit behavior descriptors that satisfy near-independence conditions and yield formal coverage bounds. On HumanEval, MBPP, and creative writing tasks this produces higher coverage of the solution space and higher QD-scores than previous approaches, with the resulting diverse archives also improving downstream test generation and fine-tuning data quality.

What carries the argument

prompt embedding evolution inside a quality-diversity optimization loop that uses hybrid behavior descriptors and co-evolutionary variation operators

If this is right

  • Diverse solution archives generated by the method produce 34 percent more edge cases when used for test generation.
  • Data selected from the archives yields an 8.3 percent accuracy gain when used to fine-tune models.
  • The same embedding-evolution process works across multiple open-source LLMs with full embedding access.
  • Formal coverage bounds hold under the observed low normalized mutual information between descriptor types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same compact-interface idea could be tested on tasks outside coding and writing, such as dialogue or planning, to check whether the diversity gains transfer.
  • If prompt embeddings prove controllable across model sizes, future work might combine this approach with other parameter-efficient methods to reduce the cost of maintaining diverse model behaviors.
  • The near-independence of descriptors suggests it may be possible to add more behavior axes without losing the coverage guarantees, provided the new measures also show low mutual information.

Load-bearing premise

The hybrid semantic and explicit behavior descriptors stay sufficiently independent to support the formal coverage bounds, and small prompt embeddings can reliably steer much larger frozen models across the tested tasks.

What would settle it

A direct measurement showing that the hybrid descriptors are strongly dependent, or a run on the same benchmarks where coverage and QD-score fail to exceed the QDAIF baseline by a statistically significant margin.

Figures

Figures reproduced from arXiv: 2605.09781 by Dongxin Guo, Jikun Wu, Siu Ming Yiu.

Figure 1
Figure 1. Figure 1: Overview of QD-LLM. Each archive cell stores a (text, prompt embedding) pair. Co-evolutionary operators jointly mutate soft prompt embeddings p and output text 𝑡. Hybrid BC combines semantic and explicit features. that feedback-based optimization can effectively shape LLM behav￾ior. Mechanistic understanding of how context influences LLM out￾puts [1] provides theoretical grounding for why evolving prompt e… view at source ↗
Figure 2
Figure 2. Figure 2: Archive coverage dynamics over evaluations (Hu [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Diverse solutions from QD-LLM archive for list intersection. Each occupies a distinct behavioral cell based on paradigm (iterative/recursive/functional/library), demonstrating meaningful algorithmic diversity beyond surface variation. Fine-tuning Data Quality. We fine-tuned CodeLlama-7B on diverse solutions from archives (500 solutions each) using standard supervised fine-tuning with learning rate 2×10−5 f… view at source ↗
read the original abstract

Large Language Models exhibit mode collapse, producing homogeneous outputs that fail to explore valid solution spaces. We present QD-LLM, a framework for parameter-efficient neuroevolution that evolves prompt embeddings, compact neural interfaces (~32K parameters) that steer generation in frozen LLMs (70B+ parameters), within a Quality-Diversity (QD) optimization framework. Our contributions: (1) evolved prompt embeddings via gradient-free optimization enabling behavioral steering without model fine-tuning; (2) hybrid behavior characterization combining semantic and explicit features with formal coverage bounds (Theorem 1) under validated near-independence (NMI $= 0.08 \pm 0.02$); (3) co-evolutionary variation operators including targeted behavioral mutation via finite-difference gradient estimation. On HumanEval (164 problems), MBPP, and creative writing benchmarks, QD-LLM achieves 46.4% higher coverage and 41.4% higher QD-Score than QDAIF ($p<0.001$, 30 runs, Vargha-Delaney $A=0.94$). We demonstrate downstream utility: diverse archives improve test generation (34% more edge cases) and fine-tuning data quality (8.3% accuracy gain). We validate across open-source LLMs (Llama-3-70B, Mistral-Large) with full embedding access, establishing prompt embedding evolution as an effective paradigm bridging neuroevolution and modern LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes QD-LLM, a parameter-efficient neuroevolution framework that evolves compact prompt embeddings (~32K parameters) to steer frozen LLMs (70B+) within a Quality-Diversity optimization loop. It introduces hybrid semantic-explicit behavior descriptors supported by a formal coverage theorem (Theorem 1) under low NMI, co-evolutionary variation operators, and reports 46.4% higher coverage and 41.4% higher QD-Score than QDAIF on HumanEval, MBPP, and creative writing benchmarks, with downstream gains in test generation and fine-tuning data quality.

Significance. If the empirical gains and formal bounds hold, the work offers a scalable way to mitigate mode collapse in LLMs by evolving a small number of steerable parameters rather than fine-tuning the full model. The reported effect sizes, multiple-run statistics, cross-model validation on Llama-3-70B and Mistral-Large, and downstream utility demonstrations constitute a concrete advance at the intersection of neuroevolution and modern generative models.

major comments (2)
  1. [Theorem 1] Theorem 1: The coverage bounds rest on the assumption that semantic and explicit behavior descriptors remain near-independent, validated only by a single global NMI value of 0.08 ± 0.02. Because NMI is an aggregate scalar, it does not rule out localized correlations that may arise in the specific regions of behavior space populated by the evolved prompt-embedding archive; without an archive-specific or subspace-localized independence check, the formal guarantee does not necessarily apply to the observed coverage numbers.
  2. [Experimental Results] Experimental protocol: The headline performance claims (46.4% coverage gain, p<0.001, 30 runs, Vargha-Delaney A=0.94) are presented without the full experimental protocol, hyperparameter search details, ablation studies isolating the hybrid descriptors from the finite-difference mutation operators, or exact embedding dimension used. This absence prevents independent verification that the gains are robust and not attributable to post-hoc choices or unequal evaluation budgets.
minor comments (2)
  1. Replace the approximate description '~32K parameters' with the exact prompt-embedding dimension employed in all reported experiments.
  2. [Abstract] Clarify whether the same embedding dimension and architecture were used for both Llama-3-70B and Mistral-Large, and state the precise access requirements for the embedding interface.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments on Theorem 1 and the experimental protocol raise important points about the strength of our formal guarantees and reproducibility. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Theorem 1] Theorem 1: The coverage bounds rest on the assumption that semantic and explicit behavior descriptors remain near-independent, validated only by a single global NMI value of 0.08 ± 0.02. Because NMI is an aggregate scalar, it does not rule out localized correlations that may arise in the specific regions of behavior space populated by the evolved prompt-embedding archive; without an archive-specific or subspace-localized independence check, the formal guarantee does not necessarily apply to the observed coverage numbers.

    Authors: We agree that a global NMI statistic, while indicative of overall low dependence, does not by itself confirm independence within the specific subspaces occupied by the QD archive. To strengthen the link between Theorem 1 and the reported coverage results, we will add an archive-specific analysis in the revised manuscript. This will include (i) partitioning the behavior space into the cells actually populated by the final archive and (ii) computing NMI within each populated cell (or small groups of adjacent cells) using the same descriptor pairs. We expect these localized values to remain low, thereby providing direct empirical support for the applicability of the coverage bound to the observed archives. The additional figures and tables will be placed in the main text or supplementary material as appropriate. revision: yes

  2. Referee: [Experimental Results] Experimental protocol: The headline performance claims (46.4% coverage gain, p<0.001, 30 runs, Vargha-Delaney A=0.94) are presented without the full experimental protocol, hyperparameter search details, ablation studies isolating the hybrid descriptors from the finite-difference mutation operators, or exact embedding dimension used. This absence prevents independent verification that the gains are robust and not attributable to post-hoc choices or unequal evaluation budgets.

    Authors: We acknowledge that the original submission omitted several details necessary for full reproducibility. In the revised manuscript we will expand the experimental section and add a dedicated reproducibility appendix containing: (1) the complete hyperparameter search procedure, ranges, and final selected values for all methods; (2) ablation experiments that separately disable the hybrid descriptor combination and the finite-difference mutation operator while keeping all other factors fixed; (3) the precise embedding dimension (32 768 parameters) together with the prompt template and tokenization details; and (4) explicit confirmation that all compared algorithms were allocated identical evaluation budgets (number of LLM calls). These additions will allow independent verification that the reported gains are attributable to the proposed components rather than unequal resources or post-hoc tuning. revision: yes

Circularity Check

0 steps flagged

No circularity: results are empirical benchmark comparisons with separate NMI validation for Theorem 1

full rationale

The paper reports performance via direct experiments on HumanEval, MBPP, and creative-writing benchmarks, giving 46.4% coverage and 41.4% QD-Score gains versus QDAIF with p-values and effect sizes from 30 runs. Theorem 1 supplies coverage bounds only under the separately measured assumption of descriptor near-independence (NMI = 0.08 ± 0.02); this NMI is an empirical scalar computed on the data, not a quantity defined by the coverage metric itself. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citation chain is invoked to justify uniqueness or an ansatz, and the hybrid-descriptor construction is presented as an input choice whose independence is externally checked rather than tautological. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the unproven transferability of small prompt embeddings to control 70B-scale frozen models and on the near-independence assumption used for coverage bounds; no new physical entities are postulated.

free parameters (1)
  • prompt embedding dimension
    Compact neural interface of approximately 32K parameters chosen to balance steering power and optimization cost.
axioms (1)
  • domain assumption Hybrid semantic and explicit features are near-independent (NMI = 0.08 ± 0.02)
    Invoked to justify formal coverage bounds in Theorem 1.
invented entities (1)
  • prompt embedding as steerable neural interface no independent evidence
    purpose: Compact interface that modulates frozen LLM behavior without weight updates
    Introduced as the evolvable object in the QD archive; no independent falsifiable prediction outside the reported benchmarks is given.

pith-pipeline@v0.9.0 · 5571 in / 1457 out tokens · 42561 ms · 2026-05-12T03:34:01.371350+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

  1. [1]

    Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou

  2. [2]

    InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023

    What learning algorithm is in-context learning? Investigations with lin- ear models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net

  3. [3]

    Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Parameter-Efficient Neuroevolution for Diverse LLM Generation GECCO ’26, July 13–17, 2026, San Jose, Costa Rica Charles Sutton. 2021. Program Synthesis with Large Language Models.arXiv preprintarXiv.2108.07732 (20...

  4. [4]

    Constitutional AI: Harmlessness from AI Feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernan- dez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse,...

  5. [5]

    Stanley, Grégory Schott, and Joel Lehman

    Herbie Bradley, Andrew Dai, Hannah Benita Teufel, Jenny Zhang, Koen Oost- ermeijer, Marco Bellagente, Jeff Clune, Kenneth O. Stanley, Grégory Schott, and Joel Lehman. 2024. Quality-Diversity through AI Feedback. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net

  6. [6]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

  7. [7]

    Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...

  8. [8]

    Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scal- ing MAP-Elites to deep neuroevolution. InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.). ACM, 67–75. doi:10.1145/3377930.3390217

  9. [9]

    Antoine Cully. 2019. Autonomous skill discovery with quality-diversity and unsupervised descriptors. InProceedings of the Genetic and Evolutionary Compu- tation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019, Anne Auger and Thomas Stützle (Eds.). ACM, 81–89. doi:10.1145/3321707.3321804

  10. [10]

    Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals.Nat.521, 7553 (2015), 503–507. doi:10.1038/ NATURE14422

  11. [11]

    Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework.IEEE Trans. Evol. Comput.22, 2 (2018), 245–259. doi:10.1109/TEVC.2017.2704781

  12. [12]

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. InAdvances in Neural Informa- tion Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenk...

  13. [13]

    Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, and Joel Lehman. 2024. Quality Di- versity through Human Feedback: Towards Open-Ended Diversity-Driven Opti- mization. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 (Proceedings of Machine Learning Research), Rus- lan Salakhutdinov, Zico Kolter, Kathe...

  14. [14]

    Maxence Faldor, Félix Chalumeau, Manon Flageat, and Antoine Cully. 2023. MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 138–146. doi:10.1145/3583131.3590503

  15. [15]

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL), Trevor Cohn, Yulan He,...

  16. [16]

    Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rocktäschel. 2024. Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution. InForty-first International Conference on Machine Learn- ing, ICML 2024, Vienna, Austria, July 21-27, 2024 (Proceedings of Machine Learning Research), Ruslan Salakhutdinov, Zico Kolter, K...

  17. [17]

    Manon Flageat and Antoine Cully. 2024. Uncertain Quality-Diversity: Evalu- ation Methodology and New Methods for Quality-Diversity in Uncertain Do- mains.IEEE Trans. Evol. Comput.28, 4 (2024), 891–902. doi:10.1109/TEVC.2023. 3273560

  18. [18]

    Manon Flageat, Johann Huber, François Hélénon, Stéphane Doncieux, and An- toine Cully. 2025. Extract-QD Framework: A Generic Approach for Quality- Diversity in Noisy, Stochastic or Uncertain Domains. InProceedings of the Ge- netic and Evolutionary Computation Conference, GECCO 2025, NH Malaga Hotel, Malaga, Spain, July 14-18, 2025, Bogdan Filipic (Ed.). A...

  19. [19]

    Fontaine and Stefanos Nikolaidis

    Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Di- versity. InAdvances in Neural Information Processing Systems 34: Annual Confer- ence on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Ed...

  20. [20]

    Fontaine and Stefanos Nikolaidis

    Matthew C. Fontaine and Stefanos Nikolaidis. 2023. Covariance Matrix Adapta- tion MAP-Annealing. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 456–465. doi:10.1145/3583131.3590389

  21. [21]

    Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K

    Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K. Hoover

  22. [22]

    InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.)

    Covariance matrix adaptation for the rapid illumination of behavior space. InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.). ACM, 94–102. doi:10.1145/ 3377930.3390232

  23. [23]

    Adam Gaier and David Ha. 2019. Weight Agnostic Neural Networks. InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Infor- mation Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds...

  24. [25]

    Yannakakis

    Daniele Gravina, Ahmed Khalifa, Antonios Liapis, Julian Togelius, and Geor- gios N. Yannakakis. 2019. Procedural Content Generation through Quality Di- versity. InIEEE Conference on Games, CoG 2019, London, United Kingdom, August 20-23, 2019. IEEE, 1–8. doi:10.1109/CIG.2019.8848053

  25. [26]

    Luca Grillotti and Antoine Cully. 2022. Unsupervised Behavior Discovery With Quality-Diversity Optimization.IEEE Trans. Evol. Comput.26, 6 (2022), 1539–

  26. [27]

    doi:10.1109/TEVC.2022.3159855

  27. [28]

    Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2024. Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net

  28. [29]

    Nikolaus Hansen. 2023. The CMA Evolution Strategy: A Tutorial.arXiv preprint arXiv.1604.00772 (2023). https://arxiv.org/abs/1604.00772

  29. [30]

    Leo Apostel

    Francis Heylighen and Jean-Marc Dewaele. 1999. Formality of language: defi- nition, measurement and behavioral determinants.Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brüssel4, 1 (1999)

  30. [31]

    Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curi- ous Case of Neural Text Degeneration. In8th International Conference on Learn- ing Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenRe- view.net

  31. [32]

    Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. InProceedings of the 36th Inter- national Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learnin...

  32. [33]

    Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

    Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large GECCO ’26, July 13–17, 2026, San Jose, Costa Rica D. Guo, J. Wu, and S. M. Yiu Language Models. InThe Tenth International Conference on Learning Representa- tions, ICLR 2022, Virtual Event, April 25-29, 20...

  33. [34]

    2019.𝜀-Entropy and𝜀-Capacity of Sets in Functional Spaces (Excerpt)

    AN Kolmogorov and VM Tihomirov. 2019.𝜀-Entropy and𝜀-Capacity of Sets in Functional Spaces (Excerpt). InClassics On Fractals. CRC Press, 298–339

  34. [35]

    Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information.Physical Review E69, 6 (jun 2004). doi:10.1103/physreve.69. 066138

  35. [36]

    Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (...

  36. [37]

    Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. InAdvances in Neural Informa- tion Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Ben- gio, Hanna M. Wallach, Hugo Larochelle, Kristen Gr...

  37. [38]

    Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Ko- cetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- haene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, ...

  38. [39]

    Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2023. Contrastive Decoding: Open-ended Text Generation as Optimization. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A...

  39. [40]

    Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Pa- pers), Virtual Event, August 1-6, 2021, Chengqing Zo...

  40. [41]

    Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu, and Oriol Vinyals

    Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...

  41. [42]

    2022), 1092–1097

    arXiv:https://www.science.org/doi/pdf/10.1126/science.abq1158 doi:10. 1126/science.abq1158

  42. [43]

    Vladislav Lialin, Vijeta Deshpande, Xiaowei Yao, and Anna Rumshisky. 2024. Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning.arXiv preprintarXiv.2303.15647 (2024). https://arxiv.org/abs/2303.15647

  43. [44]

    Hong Seo Lim and Peng Qiu. 2023. Quantifying Cell-Type-Specific Differences of Single-Cell Datasets Using Uniform Manifold Approximation and Projection for Dimension Reduction and Shapley Additive exPlanations.J. Comput. Biol. 30, 7 (2023), 738–750. doi:10.1089/CMB.2022.0366

  44. [45]

    Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Pres...

  45. [46]

    Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambro- sio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Li- dong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Un...

  46. [47]

    Nelson, Herbie Bradley, Adam Gaier, Arash Moradi Karkaj, Amy K

    Elliot Meyerson, Mark J. Nelson, Herbie Bradley, Adam Gaier, Arash Moradi Karkaj, Amy K. Hoover, and Joel Lehman. 2024. Language Model Crossover: Variation through Few-Shot Prompting.ACM Trans. Evol. Learn. Optim.4, 4 (2024), 27:1–27:40. doi:10.1145/3694791

  47. [48]

    Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James F. Allen. 2016. A Cor- pus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. InNAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tec...

  48. [49]

    Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by map- ping elites.arXiv preprintarXiv.1504.04909 (2015). https://arxiv.org/abs/1504. 04909

  49. [50]

    Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted MAP-Elites. In GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021, Francisco Chicano and Krzysztof Krawiec (Eds.). ACM, 866–875. doi:10.1145/3449639.3459304

  50. [51]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe

  51. [52]

    Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neu- ral Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, No- vember 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.)

  52. [53]

    Thomas Pierrot, Guillaume Richard, Karim Beguir, and Antoine Cully. 2022. Multi-objective quality diversity optimization. InGECCO ’22: Genetic and Evo- lutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, Jonathan E. Fieldsend and Markus Wagner (Eds.). ACM, 139–147. doi:10.1145/ 3512290.3528823

  53. [54]

    K., Soros, L

    Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation.Frontiers Robotics AI3 (2016), 40. doi:10.3389/FROBT.2016.00040

  54. [55]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th International Joint Con- ference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vin...

  55. [56]

    Pawan Kumar, Emilien Dupont, Francisco J

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi

  56. [57]

    Pawan Kumar, Emilien Dupont, Francisco J

    Mathematical discoveries from program search with large language mod- els.Nat.625, 7995 (2024), 468–475. doi:10.1038/S41586-023-06924-6

  57. [58]

    Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution Strategies as a Scalable Alternative to Reinforcement Learning.arXiv preprintarXiv.1703.03864 (2017). https://arxiv.org/abs/1703.03864

  58. [59]

    Stanley, David B

    Kenneth O. Stanley, David B. D’Ambrosio, and Jason Gauci. 2009. A Hypercube- Based Encoding for Evolving Large-Scale Neural Networks.Artif. Life15, 2 (2009), 185–212. doi:10.1162/ARTL.2009.15.2.15202

  59. [60]

    Stanley and Risto Miikkulainen

    Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks through Augmenting Topologies.Evolutionary Computation10, 2 (jun 2002), 99–127. doi:10.1162/106365602320169811

  60. [61]

    Stanley, and Jeff Clune

    Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Ken- neth O. Stanley, and Jeff Clune. 2018. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforce- ment Learning.arXiv preprintarXiv.1712.06567 (2018). https://arxiv.org/abs/ 1712.06567

  61. [62]

    Delaney, and Andras Vargha

    András Vargha, Harold D. Delaney, and Andras Vargha. 2000. A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101. doi:10.2307/1165329

  62. [63]

    Chatzilygeroudis, and Jean-Baptiste Mouret

    Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret

  63. [64]

    Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm.IEEE Trans. Evol. Comput.22, 4 (2018), 623–630. doi:10.1109/TEVC.2017.2735550

  64. [65]

    Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra

    Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. 2018. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models.arXiv preprint arXiv.1610.02424 (2018). https://arxiv.org/abs/1610.02424 Parameter-Efficient Neuroevolution for Diverse LLM Generation GECCO ’26, July 13–1...

  65. [66]

    Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, and Qiang Fu

  66. [67]

    InProceedings of the Thirty-Second Inter- national Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China

    Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting. InProceedings of the Thirty-Second Inter- national Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 4335–4343. doi:10.24963/IJCAI.2023/482

  67. [68]

    Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representa- tion Learning through Alignment and Uniformity on the Hypersphere. InPro- ceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research). PMLR, 9929–9939

  68. [69]

    Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation,

    Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Under- standing and Generation. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Fran...

  69. [70]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Sys- tems 2022, NeurIPS 2022, New Orleans, LA, USA, Nov...

  70. [71]

    Daan Wierstra, Tom Schaul, Jan Peters, and Jürgen Schmidhuber. 2008. Nat- ural Evolution Strategies. InProceedings of the IEEE Congress on Evolutionary Computation, CEC 2008, June 1-6, 2008, Hong Kong, China. IEEE, 3381–3387. doi:10.1109/CEC.2008.4631255

  71. [72]

    Xing, Hao Zhang, Joseph E

    Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT- Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems...