Recognition: unknown
Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs
Pith reviewed 2026-05-08 10:30 UTC · model grok-4.3
The pith
Treating high-level knowledge as the primary search target, with code only as an instantiation, improves efficiency, transfer, and generalization in automatic heuristic design.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By making knowledge the explicit primary search target in LLM-driven automatic heuristic design rather than searching code directly, the process yields heuristics that are discovered more efficiently, transfer more readily across problems and trajectories, and generalize better, with the strongest results obtained when knowledge-first and code-centric strategies are combined.
What carries the argument
The top-down paradigm that treats knowledge as the primary search object and code merely as its instantiation and test, formalized via a statistical-learning view that exposes a distortion-compression trade-off.
If this is right
- Knowledge-first search improves discovery efficiency over code-centric pipelines.
- Knowledge extracted during search transfers more effectively to new problem instances and search trajectories.
- Generalization across tasks increases when the search explicitly targets reusable knowledge rather than implicit patterns in code.
- Combining knowledge-first and code-centric strategies produces further performance improvements.
- Sustainable progress in automatic heuristic design requires iteratively building and evolving interpretable hypotheses that retain value beyond a single trajectory.
Where Pith is reading between the lines
- Hybrid systems that alternate between knowledge-level and code-level search could retain the reusability of explicit principles while still allowing fine-grained optimization.
- The same knowledge-first framing may apply to other LLM-driven design tasks where explicit, reusable reasoning structures are more valuable than opaque code outputs.
- Extracted knowledge could be tested for reuse in entirely new domains not encountered during the original search to measure its true generality.
- Adopting knowledge as the primary object may lower the total computational cost of repeated heuristic searches by avoiding rediscovery of the same principles.
Load-bearing premise
LLMs can reliably propose, refine, and instantiate high-level knowledge in a way that produces measurably better heuristics than direct code search and that this knowledge remains reusable across different problems.
What would settle it
A controlled comparison on multiple combinatorial optimization benchmarks in which the knowledge-first method produces no measurable gains in discovery speed, transfer performance, or generalization relative to code-centric baselines would refute the central claim.
Figures
read the original abstract
Large language models (LLMs) have recently advanced automatic heuristic design (AHD) for combinatorial optimization (CO), where candidate heuristics are iteratively proposed, evaluated, and refined. Most existing approaches search over executable programs and distill insights from execution feedback to guide later iterations. Because this process moves from low-level implementations to high-level principles, we refer to it as a bottom-up paradigm. We argue that this view is incomplete and introduce a complementary top-down perspective: knowledge becomes the primary search object and code merely instantiates and tests it, making what is learned explicit and reusable across problems and trajectories. We formalize this shift through a statistical-learning view that exposes a distortion--compression trade-off, and instantiate it in both population-based and tree-based AHD frameworks. Across CO and tasks beyond it, knowledge-first search improves discovery efficiency, transfer, and generalization, often outperforming code-centric pipelines, while combining both strategies yields further gains. Our results suggest that progress in AHD depends on iteratively constructing and evolving interpretable hypotheses that retain value beyond a single search trajectory.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a top-down paradigm for automatic heuristic design (AHD) in combinatorial optimization, positioning high-level knowledge as the primary search object for LLMs while code serves only to instantiate and test it. It formalizes the approach via a statistical-learning lens that exposes a distortion-compression trade-off, implements the idea in both population-based and tree-based AHD frameworks, and reports empirical improvements in discovery efficiency, transfer, and generalization over code-centric baselines, with additional gains from hybrid strategies.
Significance. If the reusability and superiority claims are substantiated with appropriate controls, the work could shift AHD research toward more interpretable and transferable knowledge representations, complementing existing bottom-up code-search methods. The explicit dual-framework instantiation and the trade-off formalization are constructive contributions that could aid future method design.
major comments (3)
- [Experimental evaluation] The central claim that knowledge extracted via the top-down approach is reusable across distinct trajectories and problem distributions (as opposed to within-trajectory prompting improvements) is load-bearing for the generalization and transfer assertions in the abstract; the experimental design must explicitly test out-of-distribution instances and report cross-trajectory metrics to support this over a pure within-run prompting advantage.
- [Formalization section] The distortion-compression trade-off is presented as the key formalization of the statistical-learning view, yet without visible equations or measurement protocols it is unclear how the two terms are quantified or optimized in the population-based and tree-based instantiations; this risks reducing the trade-off to an expository lens rather than a predictive or prescriptive tool.
- [Results and baselines] Reported gains over code-centric pipelines may be confounded by unequal prompt-engineering effort or search budget; the evaluation must document and equalize these factors (or ablate them) and apply multiple-testing corrections, as the abstract's empirical claims cannot otherwise be verified as robust.
minor comments (2)
- Define AHD and CO on first use in the abstract and introduction for accessibility.
- [Instantiation subsections] Clarify in the methods how knowledge is represented (e.g., natural language hypotheses, structured templates) to enable reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback, which has helped clarify several aspects of our presentation and evaluation. We address each major comment below and have revised the manuscript to incorporate the suggested improvements.
read point-by-point responses
-
Referee: [Experimental evaluation] The central claim that knowledge extracted via the top-down approach is reusable across distinct trajectories and problem distributions (as opposed to within-trajectory prompting improvements) is load-bearing for the generalization and transfer assertions in the abstract; the experimental design must explicitly test out-of-distribution instances and report cross-trajectory metrics to support this over a pure within-run prompting advantage.
Authors: We agree that distinguishing true cross-trajectory and out-of-distribution reusability from within-trajectory prompting effects is essential. In the revised manuscript we have added dedicated experiments that apply knowledge extracted from one trajectory to initialize independent searches on OOD instances drawn from different problem distributions. We now report explicit cross-trajectory metrics (e.g., success rate and efficiency gains when transferring knowledge across separate runs) that demonstrate benefits beyond single-trajectory prompting. revision: yes
-
Referee: [Formalization section] The distortion-compression trade-off is presented as the key formalization of the statistical-learning view, yet without visible equations or measurement protocols it is unclear how the two terms are quantified or optimized in the population-based and tree-based instantiations; this risks reducing the trade-off to an expository lens rather than a predictive or prescriptive tool.
Authors: The referee is correct that the original description remained largely conceptual. We have expanded the formalization section with explicit equations defining distortion (as the expected performance degradation from knowledge abstraction) and compression (as the reduction in effective search-space size), together with concrete measurement protocols that are applied to both the population-based and tree-based instantiations. These additions make the trade-off directly usable for guiding design choices. revision: yes
-
Referee: [Results and baselines] Reported gains over code-centric pipelines may be confounded by unequal prompt-engineering effort or search budget; the evaluation must document and equalize these factors (or ablate them) and apply multiple-testing corrections, as the abstract's empirical claims cannot otherwise be verified as robust.
Authors: We acknowledge the importance of controlling for prompt-engineering effort and search budget. The revised experimental section now documents the precise prompt templates and iteration budgets used for every method, enforces equalized budgets across top-down and code-centric pipelines, includes ablations that vary prompt-engineering intensity, and applies Bonferroni-corrected statistical tests to all reported comparisons. These changes confirm that the observed advantages remain robust under controlled conditions. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper advances a conceptual distinction between bottom-up code-centric and top-down knowledge-first heuristic design, formalizes the latter via a statistical-learning perspective exposing a distortion-compression trade-off, and reports empirical gains in efficiency, transfer, and generalization across CO tasks. No equations, fitted parameters, or self-citations appear in the provided text that would reduce any central claim to a tautology or input by construction. The argument rests on the instantiation of the perspective in population- and tree-based frameworks plus experimental outcomes, remaining self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can generate and iteratively refine high-level, reusable knowledge for heuristic design that is more effective than direct code generation
- ad hoc to paper A distortion-compression trade-off governs the value of learned knowledge in AHD
Reference graph
Works this paper leans on
-
[1]
Evolution of heuristics: Towards efficient automatic algorithm design using large language model
Fei Liu, Tong Xialiang, Mingxuan Yuan, Xi Lin, Fu Luo, Zhenkun Wang, Zhichao Lu, and Qingfu Zhang. Evolution of heuristics: Towards efficient automatic algorithm design using large language model. InInternational Conference on Machine Learning, pages 32201–32223. PMLR, 2024
2024
-
[2]
AlphaEvolve: A coding agent for scientific and algorithmic discovery
Alexander Novikov, Ngân V˜u, Marvin Eisenberger, Emilien Dupont, Po-Sen Huang, Adam Zsolt Wagner, Sergey Shirobokov, Borislav Kozlovskii, Francisco JR Ruiz, Abbas Mehrabian, et al. Alphaevolve: A coding agent for scientific and algorithmic discovery.arXiv preprint arXiv:2506.13131, 2025
work page internal anchor Pith review arXiv 2025
-
[3]
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, et al. Gepa: Reflective prompt evolution can outperform reinforcement learning.arXiv preprint arXiv:2507.19457, 2025
work page internal anchor Pith review arXiv 2025
-
[4]
Eureka: Human-level reward design via coding large language models
Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar. Eureka: Human-level reward design via coding large language models. InThe Twelfth International Conference on Learning Representations, 2021
2021
-
[5]
Automatically learning hybrid digital twins of dynamical systems
Samuel Holt, Tennison Liu, and Mihaela van der Schaar. Automatically learning hybrid digital twins of dynamical systems. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URLhttps://openreview.net/forum?id=SOsiObSdU2
2024
-
[6]
Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K. Reddy. LLM-SR: Scientific equation discovery via programming with large language models. InThe Thirteenth International Conference on Learning Representations, 2025. URL https: //openreview.net/forum?id=m2nmp8P5in
2025
-
[7]
A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1–72, 2026
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. A survey on large language models for code generation.ACM Transactions on Software Engineering and Methodology, 35 (2):1–72, 2026
2026
-
[8]
Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020
1901
-
[9]
Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213, 2022
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners.Advances in neural information processing systems, 35:22199–22213, 2022
2022
-
[10]
Reevo: Large language models as hyper-heuristics with reflective evolution
Haoran Ye, Jiarui Wang, Zhiguang Cao, Federico Berto, Chuanbo Hua, Haeyeon Kim, Jinkyoo Park, and Guojie Song. Reevo: Large language models as hyper-heuristics with reflective evolution. InAdvances in Neural Information Processing Systems, 2024. https://github. com/ai4co/reevo
2024
-
[11]
Dorigo and L.M
M. Dorigo and L.M. Gambardella. Ant colony system: a cooperative learning approach to the traveling salesman problem.IEEE Transactions on Evolutionary Computation, 1(1):53–66,
-
[12]
doi: 10.1109/4235.585892
-
[13]
Christos V oudouris and Edward P. K. Tsang.Guided Local Search, pages 185–218. Springer US, Boston, MA, 2003. ISBN 978-0-306-48056-0. doi: 10.1007/0-306-48056-5_7. URL https://doi.org/10.1007/0-306-48056-5_7
-
[14]
Large language model-driven large neigh- borhood search for large-scale MILP problems
Huigen Ye, Hua Xu, An Yan, and Yaoyang Cheng. Large language model-driven large neigh- borhood search for large-scale MILP problems. InForty-second International Conference on Machine Learning, 2025. URLhttps://openreview.net/forum?id=teUg2pMrF0
2025
-
[15]
Pomo: Policy optimization with multiple optima for reinforcement learning.Advances in neural information processing systems, 33:21188–21198, 2020
Yeong-Dae Kwon, Jinho Choo, Byoungjip Kim, Iljoo Yoon, Youngjune Gwon, and Seungjai Min. Pomo: Policy optimization with multiple optima for reinforcement learning.Advances in neural information processing systems, 33:21188–21198, 2020. 10
2020
-
[16]
Algorithm evolution using large language model.arXiv preprint arXiv:2311.15249, 2023
Fei Liu, Xialiang Tong, Mingxuan Yuan, and Qingfu Zhang. Algorithm evolution using large language model.arXiv preprint arXiv:2311.15249, 2023
-
[17]
Hifo-prompt: Prompting with hindsight and foresight for LLM-based automatic heuristic design
ChentongChen, Mengyuan Zhong, Jialong Shi, Jianyong Sun, and Ye Fan. Hifo-prompt: Prompting with hindsight and foresight for LLM-based automatic heuristic design. InThe Fourteenth International Conference on Learning Representations, 2026. URL https:// openreview.net/forum?id=imSLzfZ6av
2026
-
[18]
Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design
Zhi Zheng, Zhuoliang Xie, Zhenkun Wang, and Bryan Hooi. Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design. InInternational Conference on Machine Learning, pages 78338–78373. PMLR, 2025
2025
-
[19]
Motif: Multi-strategy optimization via turn-based interactive framework
Nguyen Viet Tuan Kiet, Dao Van Tung, Tran Cong Dao, and Huynh Thi Thanh Binh. Motif: Multi-strategy optimization via turn-based interactive framework. InProceedings of the AAAI Conference on Artificial Intelligence (AAAI), Singapore, January 2026. Oral Presentation
2026
-
[20]
Generalizable heuristic generation through LLMs with meta-optimization
Yiding Shi, Jianan Zhou, Wen Song, Jieyi Bi, Yaoxin Wu, Zhiguang Cao, and Jie Zhang. Generalizable heuristic generation through LLMs with meta-optimization. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview. net/forum?id=tIQZ7pVN6S
2026
-
[21]
CALM: Co-evolution of algorithms and language model for automatic heuristic design
Ziyao Huang, Weiwei Wu, Kui Wu, Wei-Bin Lee, and Jianping Wang. CALM: Co-evolution of algorithms and language model for automatic heuristic design. InThe Fourteenth International Conference on Learning Representations, 2026. URL https://openreview.net/forum? id=x6bG2Hoqdf
2026
-
[22]
Hyper-heuristics: A survey of the state of the art.Journal of the Operational Research Society, 64(12):1695–1724, 2013
Edmund K Burke, Michel Gendreau, Matthew Hyde, Graham Kendall, Gabriela Ochoa, Ender Özcan, and Rong Qu. Hyper-heuristics: A survey of the state of the art.Journal of the Operational Research Society, 64(12):1695–1724, 2013
2013
-
[23]
Exploring hyper-heuristic methodologies with genetic programming
Edmund K Burke, Mathew R Hyde, Graham Kendall, Gabriela Ochoa, Ender Ozcan, and John R Woodward. Exploring hyper-heuristic methodologies with genetic programming. In Computational intelligence: Collaboration, fusion and emergence, pages 177–201. Springer, 2009
2009
-
[24]
Mathematical discoveries from program search with large language models
Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, 2024
2024
-
[25]
Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms
Pham Vu Tuan Dat, Long Doan, and Huynh Thi Thanh Binh. Hsevo: Elevating automatic heuristic design with diversity-driven harmony search and genetic algorithm using llms. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 26931–26938, 2025
2025
-
[26]
Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models
Xuan Wu, Di Wang, Chunguo Wu, Lijie Wen, Chunyan Miao, Yubin Xiao, and You Zhou. Efficient heuristics generation for solving combinatorial optimization problems using large lan- guage models. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 2, pages 3228–3239, 2025
2025
-
[27]
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schu- urmans, Claire Cui, Olivier Bousquet, Quoc V Le, and Ed H. Chi. Least-to-most prompting enables complex reasoning in large language models. InThe Eleventh International Con- ference on Learning Representations, 2023. URL https://openreview.net/forum?id= WZH7099tgfM
2023
-
[28]
Parsel: Algo- rithmic reasoning with language models by composing decompositions.Advances in Neural Information Processing Systems, 36:31466–31523, 2023
Eric Zelikman, Qian Huang, Gabriel Poesia, Noah Goodman, and Nick Haber. Parsel: Algo- rithmic reasoning with language models by composing decompositions.Advances in Neural Information Processing Systems, 36:31466–31523, 2023
2023
-
[29]
Tenenbaum, and Chuang Gan
Shun Zhang, Zhenfang Chen, Yikang Shen, Mingyu Ding, Joshua B. Tenenbaum, and Chuang Gan. Planning with large language models for code generation. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum? id=Lr8cOOtYbfL. 11
2023
-
[30]
Codeplan: Unlocking reasoning potential in large language models by scaling code-form planning
Jiaxin Wen, Jian Guan, Hongning Wang, Wei Wu, and Minlie Huang. Codeplan: Unlocking reasoning potential in large language models by scaling code-form planning. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview. net/forum?id=dCPF1wlqj8
2025
-
[31]
Reasoning-as-logic-units: Scaling test-time reasoning in large language models through logic unit alignment
Cheryl Li, Tianyuan Xu, and Steven Y Guo. Reasoning-as-logic-units: Scaling test-time reasoning in large language models through logic unit alignment. InInternational Conference on Machine Learning, pages 36530–36550. PMLR, 2025
2025
-
[32]
AutoEP: LLMs-driven automation of hyperparameter evolution for meta- heuristic algorithms
Zhenxing Xu, Yizhe Zhang, Weidong Bao, Hao Wang, Ming Chen, Haoran Ye, Wenzheng Jiang, Hui Yan, and Ji Wang. AutoEP: LLMs-driven automation of hyperparameter evolution for meta- heuristic algorithms. InThe Fourteenth International Conference on Learning Representations,
-
[33]
URLhttps://openreview.net/forum?id=hit3hGBheP
-
[34]
Eoh-s: Evolution of heuristic set using llms for automated heuristic design
Fei Liu, Yilu Liu, Qingfu Zhang, Tong Xialiang, and Mingxuan Yuan. Eoh-s: Evolution of heuristic set using llms for automated heuristic design. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 37090–37098, 2026
2026
-
[35]
Papadimitriou and K
C.H. Papadimitriou and K. Steiglitz.Combinatorial Optimization: Algorithms and Complexity. Dover Books on Computer Science. Dover Publications, 1998. ISBN 9780486402581. URL https://books.google.com.vn/books?id=cDY-joeCGoIC
1998
-
[36]
Wolsey and G.L
L.A. Wolsey and G.L. Nemhauser.Integer and Combinatorial Optimization. Wiley Series in Discrete Mathematics and Optimization. Wiley, 1999. ISBN 9780471359432. URL https: //books.google.com.vn/books?id=vvm4DwAAQBAJ
1999
-
[37]
Karp.Reducibility among Combinatorial Problems, pages 85–103
Richard M. Karp.Reducibility among Combinatorial Problems, pages 85–103. Springer US, Boston, MA, 1972. ISBN 978-1-4684-2001-2. doi: 10.1007/978-1-4684-2001-2_9. URL https://doi.org/10.1007/978-1-4684-2001-2_9
-
[38]
Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. Pointer networks. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors,Advances in Neural Information Processing Sys- tems, volume 28. Curran Associates, Inc., 2015. URL https://proceedings.neurips.cc/ paper_files/paper/2015/file/29921001f2f04bd3baee84a12e98098f-Paper.pdf
-
[39]
Le, Mohammad Norouzi, and Samy Bengio
Irwan Bello*, Hieu Pham*, Quoc V . Le, Mohammad Norouzi, and Samy Bengio. Neural combinatorial optimization with reinforcement learning, 2017. URL https://openreview. net/forum?id=rJY3vK9eg
2017
-
[40]
Learning combinatorial optimization algorithms over graphs.Advances in neural information processing systems, 30, 2017
Elias Khalil, Hanjun Dai, Yuyu Zhang, Bistra Dilkina, and Le Song. Learning combinatorial optimization algorithms over graphs.Advances in neural information processing systems, 30, 2017
2017
-
[41]
Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019
Wouter Kool, Herke van Hoof, and Max Welling. Attention, learn to solve routing problems! In International Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=ByxBFsRqYm
2019
-
[42]
Paramils: an automatic algorithm configuration framework.Journal of artificial intelligence research, 36:267–306, 2009
Frank Hutter, Holger H Hoos, Kevin Leyton-Brown, and Thomas Stützle. Paramils: an automatic algorithm configuration framework.Journal of artificial intelligence research, 36:267–306, 2009
2009
-
[43]
Hoos, and Kevin Leyton-Brown
Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Carlos A. Coello Coello, editor,Learning and Intelligent Optimization, pages 507–523, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25566-3
2011
-
[44]
The application of Bayesian methods for seeking the extremum.Towards Global Optimization, 2(117-129):2, 1978
Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum.Towards Global Optimization, 2(117-129):2, 1978
1978
-
[45]
Jones, Matthias Schonlau, and William J
Donald R. Jones, Matthias Schonlau, and William J. Welch. Efficient global optimization of expensive black-box functions.J. Global Optimization, 13(4):455–492, 1998. URL http: //dblp.uni-trier.de/db/journals/jgo/jgo13.html#JonesSW98. 12
1998
-
[46]
Gaussian process optimization in the bandit setting: no regret and experimental design
Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. InProceedings of the 27th International Conference on International Conference on Machine Learning, pages 1015–1022, 2010
2010
-
[47]
Practical bayesian optimization of machine learning algorithms.Advances in neural information processing systems, 25, 2012
Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical bayesian optimization of machine learning algorithms.Advances in neural information processing systems, 25, 2012
2012
-
[48]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of bayesian optimization.Proceedings of the IEEE, 104 (1):148–175, 2016. doi: 10.1109/JPROC.2015.2494218
-
[49]
Large language models to enhance bayesian optimization
Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. Large language models to enhance bayesian optimization. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=OOxotBmGol
2024
-
[50]
InstructZero: Efficient instruction optimization for black-box large language models
Lichang Chen, Jiuhai Chen, Tom Goldstein, Heng Huang, and Tianyi Zhou. InstructZero: Efficient instruction optimization for black-box large language models. In Ruslan Salakhutdi- nov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Lear...
2024
-
[51]
Searching for optimal solutions with llms via bayesian optimization
Dhruv Agarwal, Manoj Ghuhan Arivazhagan, Rajarshi Das, Sandesh Swamy, Sopan Khosla, and Rashmi Gangadharaiah. Searching for optimal solutions with llms via bayesian optimization. In Y . Yue, A. Garg, N. Peng, F. Sha, and R. Yu, edi- tors,International Conference on Learning Representations, volume 2025, pages 67180– 67201, 2025. URL https://proceedings.ic...
2025
-
[52]
Hyperband-based bayesian optimization for black-box prompt selection
Lennart Schneider, Martin Wistuba, Aaron Klein, Jacek Golebiowski, Giovanni Zappella, and Felice Antonio Merra. Hyperband-based bayesian optimization for black-box prompt selection. InForty-second International Conference on Machine Learning, 2025. URL https: //openreview.net/forum?id=Lm9DXFrcHD
2025
-
[53]
Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 29(2):534–554, 2024
Xingyu Wu, Sheng-hao Wu, Jibin Wu, Liang Feng, and Kay Chen Tan. Evolutionary com- putation in the era of large language model: Survey and roadmap.IEEE Transactions on Evolutionary Computation, 29(2):534–554, 2024
2024
-
[54]
Algorithm discovery with LLMs: Evolutionary search meets reinforcement learning
Anja Surina, Amin Mansouri, Lars Quaedvlieg, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, and Caglar Gulcehre. Algorithm discovery with llms: Evolutionary search meets reinforcement learning.arXiv preprint arXiv:2504.05108, 2025
-
[55]
A systematic survey on large language models for algorithm design.ACM Computing Surveys, 58(8):1–32, 2026
Fei Liu, Yiming Yao, Ping Guo, Zhiyuan Yang, Xi Lin, Zhe Zhao, Xialiang Tong, Kun Mao, Zhichao Lu, Zhenkun Wang, et al. A systematic survey on large language models for algorithm design.ACM Computing Surveys, 58(8):1–32, 2026
2026
-
[56]
Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics.IEEE Transactions on Evolutionary Computation, 29 (2):331–345, 2024
Niki Van Stein and Thomas Bäck. Llamea: A large language model evolutionary algorithm for automatically generating metaheuristics.IEEE Transactions on Evolutionary Computation, 29 (2):331–345, 2024
2024
-
[57]
Gang Liu, Yihan Zhu, Jie Chen, and Meng Jiang. Scientific algorithm discovery by augmenting alphaevolve with deep research.arXiv preprint arXiv:2510.06056, 2025
-
[58]
Dspy: compiling declarative language model calls into state-of-the-art pipelines
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Saiful Haq, Ashutosh Sharma, Thomas T Joshi, Hanna Moazam, Heather Miller, et al. Dspy: compiling declarative language model calls into state-of-the-art pipelines. InThe Twelfth International Conference on Learning Representations, 2023
2023
-
[59]
Optimizing instructions and demonstrations for multi-stage language model programs
Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, and Omar Khattab. Optimizing instructions and demonstrations for multi-stage language model programs. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9340–9366, 2024. 13
2024
-
[60]
Llm-sr: Scientific equation discovery via programming with large language models
Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models. In The Thirteenth International Conference on Learning Representations, 2025
2025
-
[61]
Symbolic regression with a learned concept library.Advances in Neural Information Processing Systems, 37:44678–44709, 2024
Arya Grayeli, Atharva Sehgal, Omar Costilla-Reyes, Miles Cranmer, and Swarat Chaudhuri. Symbolic regression with a learned concept library.Advances in Neural Information Processing Systems, 37:44678–44709, 2024
2024
-
[62]
Coevo: Continual evolution of symbolic solutions using large language models
Ping Guo, Qingfu Zhang, and Xi Lin. Coevo: Continual evolution of symbolic solutions using large language models. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 1810–1818, 2026
2026
-
[63]
Sr-scientist: Scientific equation discovery with agentic ai, 2025
Shijie Xia, Yuhan Sun, and Pengfei Liu. Sr-scientist: Scientific equation discovery with agentic ai, 2025. URLhttps://arxiv.org/abs/2510.11661
-
[64]
Runxiang Wang, Boxiao Wang, Kai Li, Yifan Zhang, and Jian Cheng. Drsr: Llm based scientific equation discovery with dual reasoning from data and experience.arXiv preprint arXiv:2506.04282, 2025
-
[65]
Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl
Miles Cranmer. Interpretable machine learning for science with pysr and symbolicregression. jl. arXiv preprint arXiv:2305.01582, 2023
work page internal anchor Pith review arXiv 2023
-
[66]
Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024
Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024
2024
-
[67]
Evaluating language models for mathematics through interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024
Katherine M Collins, Albert Q Jiang, Simon Frieder, Lionel Wong, Miri Zilka, Umang Bhatt, Thomas Lukasiewicz, Yuhuai Wu, Joshua B Tenenbaum, William Hart, et al. Evaluating language models for mathematics through interactions.Proceedings of the National Academy of Sciences, 121(24):e2318124121, 2024
2024
-
[68]
Litsearch: A retrieval benchmark for scientific literature search
Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, and Tianyu Gao. Litsearch: A retrieval benchmark for scientific literature search. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15068–15083, 2024
2024
-
[69]
Mlagentbench: evaluating language agents on machine learning experimentation
Qian Huang, Jian V ora, Percy Liang, and Jure Leskovec. Mlagentbench: evaluating language agents on machine learning experimentation. InProceedings of the 41st International Confer- ence on Machine Learning, pages 20271–20309, 2024
2024
-
[70]
Scicode: A research coding benchmark curated by scientists.Advances in Neural Information Processing Systems, 37:30624–30650, 2024
Minyang Tian, Luyu Gao, Shizhuo D Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, et al. Scicode: A research coding benchmark curated by scientists.Advances in Neural Information Processing Systems, 37:30624–30650, 2024
2024
-
[71]
Scimon: Scientific inspiration machines optimized for novelty
Qingyun Wang, Doug Downey, Heng Ji, and Tom Hope. Scimon: Scientific inspiration machines optimized for novelty. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 279–299, 2024
2024
-
[72]
Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers
Chenglei Si, Diyi Yang, and Tatsunori Hashimoto. Can llms generate novel research ideas? a large-scale human study with 100+ nlp researchers. InThe Thirteenth International Conference on Learning Representations, 2025
2025
-
[73]
Researchagent: Iterative research idea generation over scientific literature with large language models
Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. Researchagent: Iterative research idea generation over scientific literature with large language models. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pa...
2025
-
[74]
Chenglei Si, Tatsunori Hashimoto, and Diyi Yang. The ideation-execution gap: Execution outcomes of llm-generated versus human research ideas.arXiv preprint arXiv:2506.20803, 2025
-
[75]
Leverag- ing large language models for predictive chemistry.Nature Machine Intelligence, 6(2):161–169, 2024
Kevin Maik Jablonka, Philippe Schwaller, Andres Ortega-Guerrero, and Berend Smit. Leverag- ing large language models for predictive chemistry.Nature Machine Intelligence, 6(2):161–169, 2024. 14
2024
-
[76]
Kieran Didi, Sarah Alamdari, Alex X. Lu, Bruce Wittmann, Kadina E. Johnston, Ava P. Amini, Ali Madani, Maya Czeneszew, Christian Dallago, and Kevin K. Yang. Flip2: Expanding protein fitness landscape benchmarks for real-world machine learning applications.bioRxiv, 2026. doi: 10.64898/2026.02.23.707496. URL https://www.biorxiv.org/content/early/2026/ 02/26...
-
[77]
Deepaco: Neural-enhanced ant systems for combinatorial optimization.Advances in neural information processing systems, 36:43706–43728, 2023
Haoran Ye, Jiarui Wang, Zhiguang Cao, Helan Liang, and Yong Li. Deepaco: Neural-enhanced ant systems for combinatorial optimization.Advances in neural information processing systems, 36:43706–43728, 2023
2023
-
[78]
Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023
Zhiqing Sun and Yiming Yang. Difusco: Graph-based diffusion solvers for combinatorial optimization.Advances in neural information processing systems, 36:3706–3731, 2023. 15 Appendix Back to the Beginning of Heuristic Design: Bridging Code and Knowledge with LLMs Table of Contents A Q&A 17 B Related Works 19 B.1 LLMs for CO . . . . . . . . . . . . . . . . ...
2023
-
[79]
establishes the central role of NP-completeness for many canonical CO problems. Because exact optimization can be computationally prohibitive at scale, practical CO has long relied on approximation algorithms, local search, metaheuristics, and problem-specific heuristics to obtain high-quality solutions under limited computational budgets. Machine learnin...
-
[80]
[39] substantially improves neural construction policies for routing problems
learns greedy policies for graph optimization problems such as minimum vertex cover, maximum cut, and TSP, while the attention model of Kool et al. [39] substantially improves neural construction policies for routing problems. These methods replace hand-engineered decision rules with learned policies, but they typically require task-specific training, arc...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.