Recognition: 2 theorem links
· Lean TheoremEvoPrompt: Connecting LLMs with Evolutionary Algorithms Yields Powerful Prompt Optimizers
Pith reviewed 2026-05-16 06:08 UTC · model grok-4.3
The pith
EvoPrompt uses LLMs as evolutionary operators to automatically refine prompts and beat human designs by up to 25 percent on hard benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EvoPrompt connects large language models to evolutionary algorithms so that the models themselves implement the variation operators on discrete prompt strings. Starting from an initial population, the method repeatedly asks the LLM to recombine or mutate existing prompts, evaluates the offspring on a held-out development set, and retains the stronger performers, thereby raising task accuracy without any parameter updates or gradient signals.
What carries the argument
LLM-implemented evolutionary operators (crossover and mutation) that take existing prompt strings as input and output new coherent prompt strings for the next generation.
If this is right
- Prompts for any new task can be improved automatically from a small seed set without human rewriting.
- The same evolutionary loop works unchanged for both API-only and locally runnable LLMs.
- Performance gains appear on both understanding and generation tasks as well as on the hardest subset of BIG-Bench.
Where Pith is reading between the lines
- The same operator pattern could be applied to other discrete artifacts such as code snippets or molecular strings if suitable fitness functions are defined.
- Iterating the evolutionary process inside an agent loop might allow models to self-improve their own instruction following over multiple rounds.
- Because the method needs only a development set for selection, it offers a practical route for domains where labeled test data are scarce but a small validation split exists.
Load-bearing premise
LLMs can repeatedly generate coherent, human-readable prompts as evolutionary operators without introducing inconsistencies or quality drift that would stall improvement.
What would settle it
Run the method for ten generations on a new reasoning task and measure whether average prompt coherence (by human rating or lexical diversity) drops below the starting population while accuracy fails to rise.
read the original abstract
Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 31 datasets covering language understanding, generation tasks, as well as BIG-Bench Hard (BBH) tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation (e.g., up to 25% on BBH). Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces EvoPrompt, a framework that connects large language models with evolutionary algorithms for discrete prompt optimization. It initializes a population of prompts and iteratively applies LLM-based crossover and mutation operators (with provided templates) to generate new candidates, selecting improvements based on development-set performance. Experiments across 31 datasets covering language understanding, generation, and BIG-Bench Hard tasks report that EvoPrompt outperforms human-engineered prompts and prior automatic methods such as APE, with gains reaching up to 25% on BBH for models including GPT-3.5 and Alpaca.
Significance. If the empirical results hold, the work establishes a practical synergy between LLMs and conventional evolutionary algorithms for automating prompt engineering without gradients or parameters. The explicit provision of the LLM operator templates is a clear strength that supports reproducibility and invites follow-on research on hybrid LLM-algorithm systems.
major comments (1)
- [Experiments] Experiments section: the central performance claims (outperformance on 31 datasets and up to 25% on BBH) are presented without reported statistical significance tests, standard deviations or variance across multiple runs, explicit prompt-length or token-budget controls relative to baselines, or details on the exact number of independent trials. These omissions make it difficult to assess the robustness of the reported gains.
minor comments (2)
- [Abstract] Abstract: the phrase 'up to 25% on BBH' would benefit from specifying the exact metric, baseline, and task subset to allow immediate interpretation.
- [Method] Method description: while the evolutionary loop is clearly outlined, a short pseudocode block or explicit enumeration of population size, number of generations, and selection mechanism would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the experimental robustness. We address the single major comment point by point below and commit to revisions that directly strengthen the presentation of results.
read point-by-point responses
-
Referee: Experiments section: the central performance claims (outperformance on 31 datasets and up to 25% on BBH) are presented without reported statistical significance tests, standard deviations or variance across multiple runs, explicit prompt-length or token-budget controls relative to baselines, or details on the exact number of independent trials. These omissions make it difficult to assess the robustness of the reported gains.
Authors: We agree that these details are necessary for a rigorous assessment of the claims. In the revised manuscript we will rerun the key experiments (including the BBH suite and representative subsets of the 31 datasets) across at least five independent trials with different random seeds, reporting mean performance together with standard deviations. We will add paired t-tests (or Wilcoxon signed-rank tests where normality assumptions are violated) to establish statistical significance of the reported gains over baselines. We will also insert an explicit analysis of prompt length and token usage, ensuring that EvoPrompt-generated prompts are compared against baselines under comparable length/token budgets; any residual differences will be noted and discussed. The exact number of trials and the random-seed protocol will be stated clearly in the Experiments section. revision: yes
Circularity Check
No significant circularity detected
full rationale
The manuscript describes an empirical framework (EvoPrompt) that applies LLMs as crossover and mutation operators within an evolutionary loop over discrete prompts, with fitness evaluated on held-out development sets. No equations, first-principles derivations, or parameter-fitting steps are present that would reduce reported performance gains to quantities defined inside the method itself. All central claims rest on external benchmark results across 31 datasets, with explicit prompt templates supplied for the evolutionary operators, enabling independent reproduction. The approach therefore contains no self-definitional, fitted-input, or self-citation-load-bearing reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can perform coherent crossover and mutation on discrete natural-language prompts while preserving readability and task relevance.
Forward citations
Cited by 20 Pith papers
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
DSPy compiles short declarative programs into LM pipelines that self-optimize and outperform both standard few-shot prompting and expert-written chains on math, retrieval, and QA tasks.
-
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard...
-
TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
TSCG compiles JSON tool schemas into token-efficient structured text, raising tool-use accuracy for small LLMs from 0% to 84.4% on benchmarks while cutting tokens by 52-57%.
-
Beyond Explicit Refusals: Soft-Failure Attacks on Retrieval-Augmented Generation
DEJA uses evolutionary optimization guided by an LLM-based Answer Utility Score to induce soft-failure responses in RAG systems, achieving over 79% soft attack success rate with under 15% hard failures and high stealt...
-
Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS
Self-Correcting RAG formalizes retrieval as MMKP to maximize information density under token limits and uses NLI-guided MCTS to validate faithfulness, raising accuracy and cutting hallucinations on six multi-hop QA an...
-
PromptEvolver: Prompt Inversion through Evolutionary Optimization in Natural-Language Space
PromptEvolver recovers high-fidelity natural language prompts for given images by evolving them via genetic algorithm guided by a vision-language model, outperforming prior methods on benchmarks.
-
Large Language Models as Optimizers
Large language models can optimize by being prompted with histories of past solutions and scores to propose better ones, producing prompts that raise accuracy up to 8% on GSM8K and 50% on Big-Bench Hard over human-des...
-
OpenDeepThink: Parallel Reasoning via Bradley--Terry Aggregation
OpenDeepThink improves LLM reasoning by ranking parallel candidate traces via Bradley-Terry aggregation of LLM pairwise judgments, achieving a +405 Codeforces Elo gain on Gemini 3.1 Pro after eight rounds.
-
Learning, Fast and Slow: Towards LLMs That Adapt Continually
Fast-Slow Training combines slow parameter updates with fast context optimization to achieve up to 3x better sample efficiency, higher performance, less forgetting, and preserved plasticity in continual LLM learning.
-
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...
-
FitText: Evolving Agent Tool Ecologies via Memetic Retrieval
FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.
-
AgentGA: Evolving Code Solutions in Agent-Seed Space
AgentGA uses a genetic algorithm to evolve agent seeds and achieves 74.52% human-exceeding performance on tabular AutoML tasks versus 54.15% for the AIDE baseline.
-
AgentGA: Evolving Code Solutions in Agent-Seed Space
AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
-
Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems
Prompt optimization in compound AI systems is statistically indistinguishable from random chance except when tasks have exploitable output structure; a two-stage diagnostic predicts success.
-
Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs
MemJack achieves 71.48% attack success rate on unmodified COCO val2017 images against Qwen3-VL-Plus by coordinating agents to map visual entities to malicious intents, apply multi-angle camouflage, and filter refusals...
-
AI-Driven Research for Databases
Co-evolving LLM-generated solutions with their evaluators enables discovery of novel database algorithms that outperform state-of-the-art baselines, including a query rewrite policy with up to 6.8x lower latency.
-
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
-
GEAR: Genetic AutoResearch for Agentic Code Evolution
GEAR applies genetic algorithms to maintain and evolve multiple research states in autonomous code agents, outperforming single-path baselines by continuing to discover improvements over extended runs.
-
Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction
Small open-weight language models can self-optimize prompts for clinical named entity recognition in dental notes, reaching micro F1 of 0.864 after DPO on Qwen2.5-14B.
-
Feedback Over Form: Why Execution Feedback Matters More Than Pipeline Topology in 1-3B Code Generation
Execution feedback in refinement loops improves 1-3B code generation performance far more than complex pipeline topologies discovered via evolutionary search on HumanEval and sanitized MBPP.
Reference graph
Works this paper leans on
-
[1]
Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Beno \^ t Sagot, and Lucia Specia. Asset: A dataset for tuning and evaluation of sentence simplification models with multiple rewriting transformations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.\ 4668--4679, 2020
work page 2020
-
[2]
Promptsource: An integrated development environment and repository for natural language prompts
Stephen Bach, Victor Sanh, Zheng Xin Yong, Albert Webson, Colin Raffel, Nihal V Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault F \'e vry, et al. Promptsource: An integrated development environment and repository for natural language prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System De...
work page 2022
-
[3]
Janez Brest, Sao Greiner, Borko Boskovic, Marjan Mernik, and Viljem Zumer. Self-adapting control parameters in differential evolution: A comparative study on numerical benchmark problems. IEEE transactions on evolutionary computation, 10 0 (6): 0 646--657, 2006
work page 2006
-
[4]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 0 1877--1901, 2020
work page 1901
-
[6]
Introduction to derivative-free optimization
Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization. SIAM, 2009
work page 2009
-
[7]
Differential evolution: A survey of the state-of-the-art
Swagatam Das and Ponnuthurai Nagaratnam Suganthan. Differential evolution: A survey of the state-of-the-art. IEEE transactions on evolutionary computation, 15 0 (1): 0 4--31, 2010
work page 2010
-
[8]
Recent advances in differential evolution--an updated survey
Swagatam Das, Sankha Subhra Mullick, and Ponnuthurai N Suganthan. Recent advances in differential evolution--an updated survey. Swarm and evolutionary computation, 27: 0 1--30, 2016
work page 2016
-
[9]
Rlprompt: Optimizing discrete text prompts with reinforcement learning
Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric Xing, and Zhiting Hu. Rlprompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.\ 3369--3391, 2022
work page 2022
-
[10]
Ant colony system: a cooperative learning approach to the traveling salesman problem
Marco Dorigo and Luca Maria Gambardella. Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on evolutionary computation, 1 0 (1): 0 53--66, 1997
work page 1997
-
[14]
John H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, 1975. ISBN 0262581116
work page 1975
-
[15]
John H Holland. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992
work page 1992
-
[16]
Mining and summarizing customer reviews
Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In KDD, pp.\ 168--177, 2004
work page 2004
-
[18]
Zhengbao Jiang, Frank F Xu, Jun Araki, and Graham Neubig. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8: 0 423--438, 2020
work page 2020
-
[19]
James Kennedy and Russell Eberhart. Particle swarm optimization. In Proceedings of ICNN'95-international conference on neural networks, volume 4, pp.\ 1942--1948. IEEE, 1995
work page 1942
-
[20]
Large language models are zero-shot reasoners
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35: 0 22199--22213, 2022
work page 2022
-
[23]
The power of scale for parameter-efficient prompt tuning
Brian Lester, Rami Al-Rfou, and Noah Constant. The power of scale for parameter-efficient prompt tuning. In EMNLP, pp.\ 3045--3059, 2021
work page 2021
-
[25]
Prefix-tuning: Optimizing continuous prompts for generation
Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.\ 4582--4597, 2021
work page 2021
-
[26]
Roulette-wheel selection via stochastic acceptance
Adam Lipowski and Dorota Lipowska. Roulette-wheel selection via stochastic acceptance. Physica A: Statistical Mechanics and its Applications, 391 0 (6): 0 2193--2196, 2012
work page 2012
-
[27]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55 0 (9): 0 1--35, 2023
work page 2023
-
[31]
Genetic algorithm: Theory, literature review, and application in image reconstruction
Seyedali Mirjalili, Jin Song Dong, Ali Safa Sadiq, and Hossam Faris. Genetic algorithm: Theory, literature review, and application in image reconstruction. Nature-Inspired Optimizers: Theories, Literature Reviews and Applications, pp.\ 69--85, 2020
work page 2020
-
[32]
Reframing instructional prompts to gptk’s language
Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi, and Hannaneh Hajishirzi. Reframing instructional prompts to gptk’s language. In Findings of the Association for Computational Linguistics: ACL 2022, pp.\ 589--612, 2022 a
work page 2022
-
[33]
Cross-task generalization via natural language crowdsourcing instructions
Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. Cross-task generalization via natural language crowdsourcing instructions. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 3470--3487, 2022 b
work page 2022
-
[34]
Cross-task generalization via natural language crowdsourcing instructions
Swaroop Mishra, Daniel Khashabi, Chitta Baral, and Hannaneh Hajishirzi. Cross-task generalization via natural language crowdsourcing instructions. In ACL, 2022 c
work page 2022
-
[35]
An introduction to genetic algorithms
Melanie Mitchell. An introduction to genetic algorithms. MIT press, 1998
work page 1998
-
[38]
Training language models to follow instructions with human feedback
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35: 0 27730--27744, 2022
work page 2022
-
[39]
Bo PANG. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, 2005
work page 2005
-
[40]
A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts
Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp.\ 271--278, 2004
work page 2004
-
[41]
Differential evolution: A review of more than two decades of research
Millie Pant, Hira Zaheer, Laura Garcia-Hernandez, Ajith Abraham, et al. Differential evolution: A review of more than two decades of research. Engineering Applications of Artificial Intelligence, 90: 0 103479, 2020
work page 2020
-
[43]
Kenneth V Price. Differential evolution. In Handbook of optimization: From classical to modern approach, pp.\ 187--214. Springer, 2013
work page 2013
-
[46]
Derivative-free optimization: a review of algorithms and comparison of software implementations
Luis Miguel Rios and Nikolaos V Sahinidis. Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization, 56: 0 1247--1293, 2013
work page 2013
-
[48]
Exploiting cloze-questions for few-shot text classification and natural language inference
Timo Schick and Hinrich Sch \"u tze. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp.\ 255--269, 2021
work page 2021
-
[51]
Autoprompt: Eliciting knowledge from language models with automatically generated prompts
Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.\ 4222--4235, 2020
work page 2020
-
[52]
Recursive deep models for semantic compositionality over a sentiment treebank
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In EMNLP, pp.\ 1631--1642, 2013
work page 2013
-
[53]
Rainer Storn and Kenneth Price. Differential evolution--a simple and efficient heuristic for global optimization over continuous spaces. Journal of global optimization, 11: 0 341--359, 1997
work page 1997
- [55]
-
[57]
Jakob Vesterstrom and Rene Thomsen. A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In Proceedings of the 2004 congress on evolutionary computation (IEEE Cat. No. 04TH8753), volume 2, pp.\ 1980--1987. IEEE, 2004
work page 2004
-
[58]
Building a question answering test collection
Ellen M Voorhees and Dawn M Tice. Building a question answering test collection. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp.\ 200--207, 2000
work page 2000
-
[59]
Universal adversarial triggers for attacking and analyzing nlp
Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, and Sameer Singh. Universal adversarial triggers for attacking and analyzing nlp. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.\ 2153--2162, 2019
work page 2019
-
[61]
Tournament selection --- Wikipedia , the free encyclopedia
Wikipedia contributors . Tournament selection --- Wikipedia , the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Tournament_selection&oldid=1160627612, 2023. [Online; accessed 26-September-2023]
work page 2023
-
[62]
Optimizing statistical machine translation for text simplification
Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze Chen, and Chris Callison-Burch. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4: 0 401--415, 2016
work page 2016
-
[63]
Why johnny can’t prompt: how non-ai experts try (and fail) to design llm prompts
JD Zamfirescu-Pereira, Richmond Y Wong, Bjoern Hartmann, and Qian Yang. Why johnny can’t prompt: how non-ai experts try (and fail) to design llm prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp.\ 1--21, 2023
work page 2023
-
[64]
Jingqiao Zhang and Arthur C. Sanderson. Jade: Adaptive differential evolution with optional external archive. IEEE Transactions on Evolutionary Computation, 13 0 (5): 0 945--958, 2009. doi:10.1109/TEVC.2009.2014613
-
[65]
Differentiable prompt makes pre-trained language models better few-shot learners
Ningyu Zhang, Luoqiu Li, Xiang Chen, Shumin Deng, Zhen Bi, Chuanqi Tan, Fei Huang, and Huajun Chen. Differentiable prompt makes pre-trained language models better few-shot learners. In International Conference on Learning Representations, 2021
work page 2021
-
[67]
Tempera: Test-time prompt editing via reinforcement learning
Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, and Joseph E Gonzalez. Tempera: Test-time prompt editing via reinforcement learning. In The Eleventh International Conference on Learning Representations, 2023 a
work page 2023
-
[69]
Character-level convolutional networks for text classification
Xiang Zhang, Junbo Zhao, and Yann LeCun. Character-level convolutional networks for text classification. NeurIPS, 28, 2015
work page 2015
-
[72]
Large language models are human-level prompt engineers
Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, and Jimmy Ba. Large language models are human-level prompt engineers. In The Eleventh International Conference on Learning Representations, 2022
work page 2022
-
[74]
Scaling Learning Algorithms Towards
Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards
-
[75]
and Osindero, Simon and Teh, Yee Whye , journal =
Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =
- [76]
-
[77]
Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)
-
[78]
Classification Problem Solving
Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence
- [79]
-
[80]
New Ways to Make Microcircuits Smaller---Duplicate Entry
Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science
-
[81]
Clancey and Glenn Rennels , abstract =
Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =
-
[82]
Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies
-
[83]
Poligon: A System for Parallel Problem Solving
Rice, James. Poligon: A System for Parallel Problem Solving
-
[84]
Transfer of Rule-Based Expertise through a Tutorial Dialogue
Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue
-
[85]
The Engineering of Qualitative Models
Clancey, William J. The Engineering of Qualitative Models
- [86]
- [87]
-
[88]
Advances in Neural Information Processing Systems , volume=
Training language models to follow instructions with human feedback , author=. Advances in Neural Information Processing Systems , volume=
-
[89]
OPT: Open Pre-trained Transformer Language Models
Opt: Open pre-trained transformer language models , author=. arXiv preprint arXiv:2205.01068 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[90]
International conference on machine learning , pages=
Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=
work page 2021
-
[91]
ACM Computing Surveys , volume=
Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing , author=. ACM Computing Surveys , volume=. 2023 , publisher=
work page 2023
-
[92]
The Eleventh International Conference on Learning Representations , year=
Large Language Models are Human-Level Prompt Engineers , author=. The Eleventh International Conference on Learning Representations , year=
-
[93]
Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , pages=
Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts , author=. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , pages=
work page 2023
-
[94]
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning , author=. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2022
-
[95]
The Eleventh International Conference on Learning Representations , year=
TEMPERA: Test-Time Prompt Editing via Reinforcement Learning , author=. The Eleventh International Conference on Learning Representations , year=
-
[96]
arXiv preprint arXiv:2212.12017 , year=
Srinivasan Iyer and Xi Victoria Lin and Ramakanth Pasunuru and Todor Mihaylov and Daniel Simig and Ping Yu and Kurt Shuster and Tianlu Wang and Qing Liu and Punit Singh Koura and Xian Li and Brian O'Horo and Gabriel Pereyra and Jeff Wang and Christopher Dewan and Asli Celikyilmaz and Luke Zettlemoyer and Ves Stoyanov , title =. CoRR , volume =. 2022 , url...
-
[97]
Transformers: State-of-the-Art Natural Language Processing
Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...
-
[98]
Cross-Task Generalization via Natural Language Crowdsourcing Instructions , author=. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[99]
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , author=. Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) , pages=
-
[100]
Building a question answering test collection , author=. Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , pages=
-
[101]
Dbpedia--a large-scale, multilingual knowledge base extracted from wikipedia , author=. Semantic web , volume=. 2015 , publisher=
work page 2015
-
[102]
Character-level convolutional networks for text classification , author=. NeurIPS , volume=
-
[103]
Recursive deep models for semantic compositionality over a sentiment treebank , author=. EMNLP , pages=
-
[104]
Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales , author=. ACL , year=
- [105]
-
[106]
BBT v2: Towards a Gradient-Free Future with Large Language Models
Sun, Tianxiang and He, Zhengfu and Qian, Hong and Zhou, Yunhua and Huang, Xuanjing and Qiu, Xipeng. BBT v2: Towards a Gradient-Free Future with Large Language Models. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022
work page 2022
-
[107]
International Conference on Machine Learning , pages=
Black-box tuning for language-model-as-a-service , author=. International Conference on Machine Learning , pages=. 2022 , organization=
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.