Recognition: no theorem link
Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution
Pith reviewed 2026-05-12 03:34 UTC · model grok-4.3
The pith
Evolving compact prompt embeddings inside frozen large language models produces more diverse outputs than standard methods.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
QD-LLM evolves prompt embeddings via gradient-free quality-diversity optimization to steer frozen LLMs, using hybrid semantic-plus-explicit behavior descriptors that satisfy near-independence conditions and yield formal coverage bounds. On HumanEval, MBPP, and creative writing tasks this produces higher coverage of the solution space and higher QD-scores than previous approaches, with the resulting diverse archives also improving downstream test generation and fine-tuning data quality.
What carries the argument
prompt embedding evolution inside a quality-diversity optimization loop that uses hybrid behavior descriptors and co-evolutionary variation operators
If this is right
- Diverse solution archives generated by the method produce 34 percent more edge cases when used for test generation.
- Data selected from the archives yields an 8.3 percent accuracy gain when used to fine-tune models.
- The same embedding-evolution process works across multiple open-source LLMs with full embedding access.
- Formal coverage bounds hold under the observed low normalized mutual information between descriptor types.
Where Pith is reading between the lines
- The same compact-interface idea could be tested on tasks outside coding and writing, such as dialogue or planning, to check whether the diversity gains transfer.
- If prompt embeddings prove controllable across model sizes, future work might combine this approach with other parameter-efficient methods to reduce the cost of maintaining diverse model behaviors.
- The near-independence of descriptors suggests it may be possible to add more behavior axes without losing the coverage guarantees, provided the new measures also show low mutual information.
Load-bearing premise
The hybrid semantic and explicit behavior descriptors stay sufficiently independent to support the formal coverage bounds, and small prompt embeddings can reliably steer much larger frozen models across the tested tasks.
What would settle it
A direct measurement showing that the hybrid descriptors are strongly dependent, or a run on the same benchmarks where coverage and QD-score fail to exceed the QDAIF baseline by a statistically significant margin.
Figures
read the original abstract
Large Language Models exhibit mode collapse, producing homogeneous outputs that fail to explore valid solution spaces. We present QD-LLM, a framework for parameter-efficient neuroevolution that evolves prompt embeddings, compact neural interfaces (~32K parameters) that steer generation in frozen LLMs (70B+ parameters), within a Quality-Diversity (QD) optimization framework. Our contributions: (1) evolved prompt embeddings via gradient-free optimization enabling behavioral steering without model fine-tuning; (2) hybrid behavior characterization combining semantic and explicit features with formal coverage bounds (Theorem 1) under validated near-independence (NMI $= 0.08 \pm 0.02$); (3) co-evolutionary variation operators including targeted behavioral mutation via finite-difference gradient estimation. On HumanEval (164 problems), MBPP, and creative writing benchmarks, QD-LLM achieves 46.4% higher coverage and 41.4% higher QD-Score than QDAIF ($p<0.001$, 30 runs, Vargha-Delaney $A=0.94$). We demonstrate downstream utility: diverse archives improve test generation (34% more edge cases) and fine-tuning data quality (8.3% accuracy gain). We validate across open-source LLMs (Llama-3-70B, Mistral-Large) with full embedding access, establishing prompt embedding evolution as an effective paradigm bridging neuroevolution and modern LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes QD-LLM, a parameter-efficient neuroevolution framework that evolves compact prompt embeddings (~32K parameters) to steer frozen LLMs (70B+) within a Quality-Diversity optimization loop. It introduces hybrid semantic-explicit behavior descriptors supported by a formal coverage theorem (Theorem 1) under low NMI, co-evolutionary variation operators, and reports 46.4% higher coverage and 41.4% higher QD-Score than QDAIF on HumanEval, MBPP, and creative writing benchmarks, with downstream gains in test generation and fine-tuning data quality.
Significance. If the empirical gains and formal bounds hold, the work offers a scalable way to mitigate mode collapse in LLMs by evolving a small number of steerable parameters rather than fine-tuning the full model. The reported effect sizes, multiple-run statistics, cross-model validation on Llama-3-70B and Mistral-Large, and downstream utility demonstrations constitute a concrete advance at the intersection of neuroevolution and modern generative models.
major comments (2)
- [Theorem 1] Theorem 1: The coverage bounds rest on the assumption that semantic and explicit behavior descriptors remain near-independent, validated only by a single global NMI value of 0.08 ± 0.02. Because NMI is an aggregate scalar, it does not rule out localized correlations that may arise in the specific regions of behavior space populated by the evolved prompt-embedding archive; without an archive-specific or subspace-localized independence check, the formal guarantee does not necessarily apply to the observed coverage numbers.
- [Experimental Results] Experimental protocol: The headline performance claims (46.4% coverage gain, p<0.001, 30 runs, Vargha-Delaney A=0.94) are presented without the full experimental protocol, hyperparameter search details, ablation studies isolating the hybrid descriptors from the finite-difference mutation operators, or exact embedding dimension used. This absence prevents independent verification that the gains are robust and not attributable to post-hoc choices or unequal evaluation budgets.
minor comments (2)
- Replace the approximate description '~32K parameters' with the exact prompt-embedding dimension employed in all reported experiments.
- [Abstract] Clarify whether the same embedding dimension and architecture were used for both Llama-3-70B and Mistral-Large, and state the precise access requirements for the embedding interface.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive review. The comments on Theorem 1 and the experimental protocol raise important points about the strength of our formal guarantees and reproducibility. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Theorem 1] Theorem 1: The coverage bounds rest on the assumption that semantic and explicit behavior descriptors remain near-independent, validated only by a single global NMI value of 0.08 ± 0.02. Because NMI is an aggregate scalar, it does not rule out localized correlations that may arise in the specific regions of behavior space populated by the evolved prompt-embedding archive; without an archive-specific or subspace-localized independence check, the formal guarantee does not necessarily apply to the observed coverage numbers.
Authors: We agree that a global NMI statistic, while indicative of overall low dependence, does not by itself confirm independence within the specific subspaces occupied by the QD archive. To strengthen the link between Theorem 1 and the reported coverage results, we will add an archive-specific analysis in the revised manuscript. This will include (i) partitioning the behavior space into the cells actually populated by the final archive and (ii) computing NMI within each populated cell (or small groups of adjacent cells) using the same descriptor pairs. We expect these localized values to remain low, thereby providing direct empirical support for the applicability of the coverage bound to the observed archives. The additional figures and tables will be placed in the main text or supplementary material as appropriate. revision: yes
-
Referee: [Experimental Results] Experimental protocol: The headline performance claims (46.4% coverage gain, p<0.001, 30 runs, Vargha-Delaney A=0.94) are presented without the full experimental protocol, hyperparameter search details, ablation studies isolating the hybrid descriptors from the finite-difference mutation operators, or exact embedding dimension used. This absence prevents independent verification that the gains are robust and not attributable to post-hoc choices or unequal evaluation budgets.
Authors: We acknowledge that the original submission omitted several details necessary for full reproducibility. In the revised manuscript we will expand the experimental section and add a dedicated reproducibility appendix containing: (1) the complete hyperparameter search procedure, ranges, and final selected values for all methods; (2) ablation experiments that separately disable the hybrid descriptor combination and the finite-difference mutation operator while keeping all other factors fixed; (3) the precise embedding dimension (32 768 parameters) together with the prompt template and tokenization details; and (4) explicit confirmation that all compared algorithms were allocated identical evaluation budgets (number of LLM calls). These additions will allow independent verification that the reported gains are attributable to the proposed components rather than unequal resources or post-hoc tuning. revision: yes
Circularity Check
No circularity: results are empirical benchmark comparisons with separate NMI validation for Theorem 1
full rationale
The paper reports performance via direct experiments on HumanEval, MBPP, and creative-writing benchmarks, giving 46.4% coverage and 41.4% QD-Score gains versus QDAIF with p-values and effect sizes from 30 runs. Theorem 1 supplies coverage bounds only under the separately measured assumption of descriptor near-independence (NMI = 0.08 ± 0.02); this NMI is an empirical scalar computed on the data, not a quantity defined by the coverage metric itself. No equations reduce a claimed prediction to a fitted parameter by construction, no self-citation chain is invoked to justify uniqueness or an ansatz, and the hybrid-descriptor construction is presented as an input choice whose independence is externally checked rather than tautological. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- prompt embedding dimension
axioms (1)
- domain assumption Hybrid semantic and explicit features are near-independent (NMI = 0.08 ± 0.02)
invented entities (1)
-
prompt embedding as steerable neural interface
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou
-
[2]
What learning algorithm is in-context learning? Investigations with lin- ear models. InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net
work page 2023
-
[3]
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Parameter-Efficient Neuroevolution for Diverse LLM Generation GECCO ’26, July 13–17, 2026, San Jose, Costa Rica Charles Sutton. 2021. Program Synthesis with Large Language Models.arXiv preprintarXiv.2108.07732 (20...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[4]
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernan- dez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse,...
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[5]
Stanley, Grégory Schott, and Joel Lehman
Herbie Bradley, Andrew Dai, Hannah Benita Teufel, Jenny Zhang, Koen Oost- ermeijer, Marco Bellagente, Jeff Clune, Kenneth O. Stanley, Grégory Schott, and Joel Lehman. 2024. Quality-Diversity through AI Feedback. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net
work page 2024
-
[6]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page 2020
-
[7]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[8]
Cédric Colas, Vashisht Madhavan, Joost Huizinga, and Jeff Clune. 2020. Scal- ing MAP-Elites to deep neuroevolution. InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.). ACM, 67–75. doi:10.1145/3377930.3390217
-
[9]
Antoine Cully. 2019. Autonomous skill discovery with quality-diversity and unsupervised descriptors. InProceedings of the Genetic and Evolutionary Compu- tation Conference, GECCO 2019, Prague, Czech Republic, July 13-17, 2019, Anne Auger and Thomas Stützle (Eds.). ACM, 81–89. doi:10.1145/3321707.3321804
-
[10]
Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals.Nat.521, 7553 (2015), 503–507. doi:10.1038/ NATURE14422
work page 2015
-
[11]
Antoine Cully and Yiannis Demiris. 2018. Quality and Diversity Optimization: A Unifying Modular Framework.IEEE Trans. Evol. Comput.22, 2 (2018), 245–259. doi:10.1109/TEVC.2017.2704781
-
[12]
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. InAdvances in Neural Informa- tion Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenk...
work page 2023
-
[13]
Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, and Joel Lehman. 2024. Quality Di- versity through Human Feedback: Towards Open-Ended Diversity-Driven Opti- mization. InForty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024 (Proceedings of Machine Learning Research), Rus- lan Salakhutdinov, Zico Kolter, Kathe...
work page 2024
-
[14]
Maxence Faldor, Félix Chalumeau, Manon Flageat, and Antoine Cully. 2023. MAP-Elites with Descriptor-Conditioned Gradients and Archive Distillation into a Single Policy. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 138–146. doi:10.1145/3583131.3590503
-
[15]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. InFindings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL), Trevor Cohn, Yulan He,...
-
[16]
Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, and Tim Rocktäschel. 2024. Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution. InForty-first International Conference on Machine Learn- ing, ICML 2024, Vienna, Austria, July 21-27, 2024 (Proceedings of Machine Learning Research), Ruslan Salakhutdinov, Zico Kolter, K...
work page 2024
-
[17]
Manon Flageat and Antoine Cully. 2024. Uncertain Quality-Diversity: Evalu- ation Methodology and New Methods for Quality-Diversity in Uncertain Do- mains.IEEE Trans. Evol. Comput.28, 4 (2024), 891–902. doi:10.1109/TEVC.2023. 3273560
-
[18]
Manon Flageat, Johann Huber, François Hélénon, Stéphane Doncieux, and An- toine Cully. 2025. Extract-QD Framework: A Generic Approach for Quality- Diversity in Noisy, Stochastic or Uncertain Domains. InProceedings of the Ge- netic and Evolutionary Computation Conference, GECCO 2025, NH Malaga Hotel, Malaga, Spain, July 14-18, 2025, Bogdan Filipic (Ed.). A...
-
[19]
Fontaine and Stefanos Nikolaidis
Matthew C. Fontaine and Stefanos Nikolaidis. 2021. Differentiable Quality Di- versity. InAdvances in Neural Information Processing Systems 34: Annual Confer- ence on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (Ed...
work page 2021
-
[20]
Fontaine and Stefanos Nikolaidis
Matthew C. Fontaine and Stefanos Nikolaidis. 2023. Covariance Matrix Adapta- tion MAP-Annealing. InProceedings of the Genetic and Evolutionary Computation Conference, GECCO 2023, Lisbon, Portugal, July 15-19, 2023, Sara Silva and Luís Paquete (Eds.). ACM, 456–465. doi:10.1145/3583131.3590389
-
[21]
Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K
Matthew C. Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K. Hoover
-
[22]
Covariance matrix adaptation for the rapid illumination of behavior space. InGECCO ’20: Genetic and Evolutionary Computation Conference, Cancún Mexico, July 8-12, 2020, Carlos Artemio Coello Coello (Ed.). ACM, 94–102. doi:10.1145/ 3377930.3390232
-
[23]
Adam Gaier and David Ha. 2019. Weight Agnostic Neural Networks. InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Infor- mation Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds...
work page 2019
-
[25]
Daniele Gravina, Ahmed Khalifa, Antonios Liapis, Julian Togelius, and Geor- gios N. Yannakakis. 2019. Procedural Content Generation through Quality Di- versity. InIEEE Conference on Games, CoG 2019, London, United Kingdom, August 20-23, 2019. IEEE, 1–8. doi:10.1109/CIG.2019.8848053
-
[26]
Luca Grillotti and Antoine Cully. 2022. Unsupervised Behavior Discovery With Quality-Diversity Optimization.IEEE Trans. Evol. Comput.26, 6 (2022), 1539–
work page 2022
-
[27]
doi:10.1109/TEVC.2022.3159855
-
[28]
Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2024. Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers. InThe Twelfth In- ternational Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net
work page 2024
- [29]
-
[30]
Francis Heylighen and Jean-Marc Dewaele. 1999. Formality of language: defi- nition, measurement and behavioral determinants.Interner Bericht, Center “Leo Apostel”, Vrije Universiteit Brüssel4, 1 (1999)
work page 1999
-
[31]
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curi- ous Case of Neural Text Degeneration. In8th International Conference on Learn- ing Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenRe- view.net
work page 2020
-
[32]
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. 2019. Parameter-Efficient Transfer Learning for NLP. InProceedings of the 36th Inter- national Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (Proceedings of Machine Learnin...
work page 2019
-
[33]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2022. LoRA: Low-Rank Adaptation of Large GECCO ’26, July 13–17, 2026, San Jose, Costa Rica D. Guo, J. Wu, and S. M. Yiu Language Models. InThe Tenth International Conference on Learning Representa- tions, ICLR 2022, Virtual Event, April 25-29, 20...
work page 2022
-
[34]
2019.𝜀-Entropy and𝜀-Capacity of Sets in Functional Spaces (Excerpt)
AN Kolmogorov and VM Tihomirov. 2019.𝜀-Entropy and𝜀-Capacity of Sets in Functional Spaces (Excerpt). InClassics On Fractals. CRC Press, 298–339
work page 2019
-
[35]
Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information.Physical Review E69, 6 (jun 2004). doi:10.1103/physreve.69. 066138
-
[36]
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (...
-
[37]
Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer, and Tom Goldstein. 2018. Visualizing the Loss Landscape of Neural Nets. InAdvances in Neural Informa- tion Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Ben- gio, Hanna M. Wallach, Hugo Larochelle, Kristen Gr...
work page 2018
-
[38]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Ko- cetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier De- haene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, ...
work page 2023
-
[39]
Xiang Lisa Li, Ari Holtzman, Daniel Fried, Percy Liang, Jason Eisner, Tatsunori Hashimoto, Luke Zettlemoyer, and Mike Lewis. 2023. Contrastive Decoding: Open-ended Text Generation as Optimization. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, A...
-
[40]
Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. InProceedings of the 59th Annual Meeting of the As- sociation for Computational Linguistics and the 11th International Joint Confer- ence on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Pa- pers), Virtual Event, August 1-6, 2021, Chengqing Zo...
-
[41]
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d’Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushm...
work page 2022
-
[42]
arXiv:https://www.science.org/doi/pdf/10.1126/science.abq1158 doi:10. 1126/science.abq1158
- [43]
-
[44]
Hong Seo Lim and Peng Qiu. 2023. Quantifying Cell-Type-Specific Differences of Single-Cell Datasets Using Uniform Manifold Approximation and Projection for Dimension Reduction and Shapley Additive exPlanations.J. Comput. Biol. 30, 7 (2023), 738–750. doi:10.1089/CMB.2022.0366
-
[45]
Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2022. P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks. InProceedings of the 60th Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 2: Short Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Pres...
work page 2022
-
[46]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambro- sio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Li- dong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Un...
work page 2021
-
[47]
Nelson, Herbie Bradley, Adam Gaier, Arash Moradi Karkaj, Amy K
Elliot Meyerson, Mark J. Nelson, Herbie Bradley, Adam Gaier, Arash Moradi Karkaj, Amy K. Hoover, and Joel Lehman. 2024. Language Model Crossover: Variation through Few-Shot Prompting.ACM Trans. Evol. Learn. Optim.4, 4 (2024), 27:1–27:40. doi:10.1145/3694791
-
[48]
Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James F. Allen. 2016. A Cor- pus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. InNAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tec...
- [49]
-
[50]
Olle Nilsson and Antoine Cully. 2021. Policy gradient assisted MAP-Elites. In GECCO ’21: Genetic and Evolutionary Computation Conference, Lille, France, July 10-14, 2021, Francisco Chicano and Krzysztof Krawiec (Eds.). ACM, 866–875. doi:10.1145/3449639.3459304
-
[51]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe
-
[52]
Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems 35: Annual Conference on Neu- ral Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, No- vember 28 - December 9, 2022, Sanmi Koyejo, S. Mohamed, A. Agarwal, Danielle Belgrave, K. Cho, and A. Oh (Eds.)
work page 2022
-
[53]
Thomas Pierrot, Guillaume Richard, Karim Beguir, and Antoine Cully. 2022. Multi-objective quality diversity optimization. InGECCO ’22: Genetic and Evo- lutionary Computation Conference, Boston, Massachusetts, USA, July 9 - 13, 2022, Jonathan E. Fieldsend and Markus Wagner (Eds.). ACM, 139–147. doi:10.1145/ 3512290.3528823
-
[54]
Justin K. Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation.Frontiers Robotics AI3 (2016), 40. doi:10.3389/FROBT.2016.00040
-
[55]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empir- ical Methods in Natural Language Processing and the 9th International Joint Con- ference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vin...
-
[56]
Pawan Kumar, Emilien Dupont, Francisco J
Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M. Pawan Kumar, Emilien Dupont, Francisco J. R. Ruiz, Jordan S. Ellenberg, Pengming Wang, Omar Fawzi, Pushmeet Kohli, and Alhussein Fawzi
-
[57]
Pawan Kumar, Emilien Dupont, Francisco J
Mathematical discoveries from program search with large language mod- els.Nat.625, 7995 (2024), 468–475. doi:10.1038/S41586-023-06924-6
- [58]
-
[59]
Kenneth O. Stanley, David B. D’Ambrosio, and Jason Gauci. 2009. A Hypercube- Based Encoding for Evolving Large-Scale Neural Networks.Artif. Life15, 2 (2009), 185–212. doi:10.1162/ARTL.2009.15.2.15202
-
[60]
Stanley and Risto Miikkulainen
Kenneth O. Stanley and Risto Miikkulainen. 2002. Evolving Neural Networks through Augmenting Topologies.Evolutionary Computation10, 2 (jun 2002), 99–127. doi:10.1162/106365602320169811
-
[61]
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Ken- neth O. Stanley, and Jeff Clune. 2018. Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforce- ment Learning.arXiv preprintarXiv.1712.06567 (2018). https://arxiv.org/abs/ 1712.06567
-
[62]
András Vargha, Harold D. Delaney, and Andras Vargha. 2000. A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong.Journal of Educational and Behavioral Statistics25, 2 (2000), 101. doi:10.2307/1165329
-
[63]
Chatzilygeroudis, and Jean-Baptiste Mouret
Vassilis Vassiliades, Konstantinos I. Chatzilygeroudis, and Jean-Baptiste Mouret
-
[64]
Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm.IEEE Trans. Evol. Comput.22, 4 (2018), 623–630. doi:10.1109/TEVC.2017.2735550
-
[65]
Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra
Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra. 2018. Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models.arXiv preprint arXiv.1610.02424 (2018). https://arxiv.org/abs/1610.02424 Parameter-Efficient Neuroevolution for Diverse LLM Generation GECCO ’26, July 13–1...
-
[66]
Ren-Jian Wang, Ke Xue, Haopu Shang, Chao Qian, Haobo Fu, and Qiang Fu
-
[67]
Multi-objective Optimization-based Selection for Quality-Diversity by Non-surrounded-dominated Sorting. InProceedings of the Thirty-Second Inter- national Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China. ijcai.org, 4335–4343. doi:10.24963/IJCAI.2023/482
-
[68]
Tongzhou Wang and Phillip Isola. 2020. Understanding Contrastive Representa- tion Learning through Alignment and Uniformity on the Hypersphere. InPro- ceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event (Proceedings of Machine Learning Research). PMLR, 9929–9939
work page 2020
-
[69]
Yue Wang, Weishi Wang, Shafiq R. Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Under- standing and Generation. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Fran...
-
[70]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Sys- tems 2022, NeurIPS 2022, New Orleans, LA, USA, Nov...
work page 2022
-
[71]
Daan Wierstra, Tom Schaul, Jan Peters, and Jürgen Schmidhuber. 2008. Nat- ural Evolution Strategies. InProceedings of the IEEE Congress on Evolutionary Computation, CEC 2008, June 1-6, 2008, Hong Kong, China. IEEE, 3381–3387. doi:10.1109/CEC.2008.4631255
-
[72]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT- Bench and Chatbot Arena. InAdvances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.