MeEvo: Metacognitive Evolution Combined with Natural Evolution for Automatic Heuristic Design
Pith reviewed 2026-06-30 10:51 UTC · model grok-4.3
The pith
MeEvo improves automatic heuristic design by cycling natural evolution of code with metacognitive reflection on reasoning traces.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MeEvo is an AHD framework that cyclically couples Natural Evolution and Metacognitive Evolution with operator balance that shifts from exploration to exploitation. Natural Evolution explores heuristic code while recording LLM-generated reasoning traces, fitness values, errors and best heuristic into a shared history; Metacognitive Evolution then reflects on this history to generate improved heuristics that feed into the next Natural Evolution cycle. This design enables population-driven exploration and reflection-driven refinement to reinforce each other.
What carries the argument
The cyclic coupling of Natural Evolution (population-based code exploration that records reasoning traces into a shared history) and Metacognitive Evolution (reflection on the history to refine the next population).
If this is right
- Population-level code recombination and reflection on design traces become mutually reinforcing rather than separate.
- Search efficiency and stability increase because knowledge from earlier design decisions is retained and reused.
- Performance gains are largest on problems with complex constraints where isolated evolution or isolated reflection is insufficient.
- The framework produces heuristics with lower variance across runs than either standalone paradigm.
- Operator balance can be adjusted to move from broad exploration early to focused exploitation later.
Where Pith is reading between the lines
- The same cyclic pattern could be tested on LLM-driven design of non-heuristic artifacts such as neural architectures or scheduling policies.
- If the shared history is made persistent across entirely different problem domains, cross-domain knowledge transfer might emerge without explicit transfer learning.
- The approach suggests that evolutionary computation in general could benefit from retaining intermediate reasoning artifacts rather than only final solutions or code.
- An open question is whether the reflection step could be replaced by a lighter non-LLM process once the history structure is fixed.
Load-bearing premise
Recording LLM-generated reasoning traces, fitness values, errors, and best heuristics into a shared history will allow metacognitive reflection to generate improved heuristics that meaningfully reinforce the next natural-evolution cycle.
What would settle it
If the same five optimization problems are solved by MeEvo and by the compared LLM-based architectures and MeEvo shows no gain in solution quality or variance, especially on the complex constrained instances, the central claim is falsified.
Figures
read the original abstract
Large Language Models (LLMs) have advanced Automatic Heuristic Design (AHD) by enabling heuristic generation through reasoning and code synthesis. In LLM-based AHD, the LLM reasons about algorithm design and generates executable heuristic code. Existing architectures adopt two main paradigms: Natural Evolution applies crossover and mutation to this code to explore diverse strategies, but discards the reasoning traces behind the design decisions, weakening knowledge inheritance; Metacognitive Evolution retains these reasoning traces and refines them through reflection, but lacks population-level recombination, limiting exploration. These limitations reduce search efficiency, stability, and solution quality on complex problems. To address this gap, we propose MeEvo, an AHD framework that cyclically couples Natural Evolution and Metacognitive Evolution with operator balance that shifts from exploration to exploitation. Natural Evolution explores heuristic code while recording LLM-generated reasoning traces, fitness values, errors and best heuristic into a shared history; Metacognitive Evolution then reflects on this history to generate improved heuristics that feed into the next Natural Evolution cycle. This design enables population-driven exploration and reflection-driven refinement to reinforce each other. Experiments on five optimization problems show that MeEvo achieves stronger performance and lower variance than tested LLM-based AHD architectures, especially on complex constrained tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes MeEvo, an AHD framework that cyclically couples Natural Evolution (population-based crossover/mutation on heuristic code while recording reasoning traces, fitness, errors, and best heuristics into a shared history) with Metacognitive Evolution (reflection on that history to refine heuristics). An operator balance shifts from exploration to exploitation across cycles. Experiments on five optimization problems are claimed to demonstrate stronger performance and lower variance than tested LLM-based AHD architectures, with particular gains on complex constrained tasks.
Significance. If the performance gains are attributable to the cyclic reinforcement mechanism rather than ancillary factors, the work would meaningfully advance LLM-based automatic heuristic design by addressing the documented trade-off between population-level exploration and reasoning-trace inheritance. The explicit shared-history design and shifting balance constitute a concrete, testable contribution to the field.
major comments (2)
- [Experiments] Experiments section: The central claim that MeEvo outperforms baselines 'due to' the cyclic coupling between Natural Evolution and Metacognitive Evolution is not supported by any ablation that removes the shared-history reflection step or the operator-balance shift. Without such controls, gains cannot be isolated from increased LLM calls, prompting differences, or implementation details.
- [§4] §4 (problem definitions and baselines): No explicit definitions of the five optimization problems, LLM version(s), run counts, variance reporting method, or baseline re-implementations are provided. This information is load-bearing for the cross-architecture comparison and the claim of particular advantage on constrained tasks.
minor comments (2)
- [Abstract / §3] The abstract and introduction use 'operator balance that shifts from exploration to exploitation' without a precise schedule or pseudocode; a short algorithmic description would clarify the transition rule.
- [§3] Notation for the shared history (traces, fitness, errors, best heuristics) is introduced informally; a compact table or equation defining its structure would aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our contributions. We address each major point below and commit to revisions that strengthen the empirical support and reproducibility of the work.
read point-by-point responses
-
Referee: [Experiments] Experiments section: The central claim that MeEvo outperforms baselines 'due to' the cyclic coupling between Natural Evolution and Metacognitive Evolution is not supported by any ablation that removes the shared-history reflection step or the operator-balance shift. Without such controls, gains cannot be isolated from increased LLM calls, prompting differences, or implementation details.
Authors: We agree that the current experiments do not include direct ablations that isolate the shared-history reflection step and the operator-balance shift while holding total LLM calls and prompting constant. The existing comparisons are against standalone Natural Evolution and Metacognitive Evolution baselines, which provide indirect evidence that the cyclic combination yields gains, but they do not fully rule out ancillary factors. In the revised manuscript we will add two targeted ablations: (1) a version without the shared-history reflection (i.e., Natural Evolution only, with equivalent call budget) and (2) a version with fixed operator balance instead of the scheduled shift. These will be reported with the same metrics and run counts, allowing readers to attribute performance differences more precisely. revision: yes
-
Referee: [§4] §4 (problem definitions and baselines): No explicit definitions of the five optimization problems, LLM version(s), run counts, variance reporting method, or baseline re-implementations are provided. This information is load-bearing for the cross-architecture comparison and the claim of particular advantage on constrained tasks.
Authors: The referee is correct that these details are essential for reproducibility and for substantiating the claim of advantage on constrained tasks. The revised Section 4 will explicitly define each of the five problems (including mathematical formulations, decision variables, constraints, and objective functions), state the exact LLM versions and API settings, report the number of independent runs (with seed information), describe the variance computation (standard deviation across runs), and provide pseudocode or repository links for the baseline re-implementations to ensure identical experimental conditions. revision: yes
Circularity Check
No circularity: purely empirical claims with no derivations or self-referential reductions
full rationale
The paper advances an empirical framework (MeEvo) and reports measured performance gains on five optimization problems. The abstract and described content contain no equations, fitted parameters, uniqueness theorems, or derivation chains. Claims rest on experimental comparison rather than any identity or prediction that reduces to its own inputs by construction. No self-citation load-bearing steps or ansatz smuggling appear in the provided text. This matches the default expectation of a non-circular empirical paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
A heuristic method for solving redundancy optimization problems in complex systems,
J.-H. Kim and B.-J. Yum, “A heuristic method for solving redundancy optimization problems in complex systems,”IEEE Transactions on Reliability, vol. 42, no. 4, pp. 572–578, 1993
1993
-
[2]
No free lunch theorems for optimization,
D. H. Wolpert and W. G. Macready, “No free lunch theorems for optimization,”IEEE transactions on evolutionary computation, vol. 1, no. 1, pp. 67–82, 1997
1997
-
[3]
Automatic design of a hyper-heuristic framework with gene expression programming for com- binatorial optimization problems,
N. R. Sabar, M. Ayob, G. Kendall, and R. Qu, “Automatic design of a hyper-heuristic framework with gene expression programming for com- binatorial optimization problems,”IEEE Transactions on Evolutionary Computation, vol. 19, no. 3, pp. 309–325, 2014
2014
-
[4]
A survey on the application of genetic programming to classification,
P. G. Espejo, S. Ventura, and F. Herrera, “A survey on the application of genetic programming to classification,”IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 40, no. 2, pp. 121–144, 2009
2009
-
[5]
Automated design of metaheuristics using reinforcement learning within a novel general search framework,
W. Yi, R. Qu, L. Jiao, and B. Niu, “Automated design of metaheuristics using reinforcement learning within a novel general search framework,” IEEE Transactions on Evolutionary Computation, vol. 27, no. 4, pp. 1072–1084, 2022
2022
-
[6]
F. Liu, X. Tong, M. Yuan, X. Lin, F. Luo, Z. Wang, Z. Lu, and Q. Zhang, “Evolution of heuristics: Towards efficient automatic algorithm design using large language model,”arXiv preprint arXiv:2401.02051, 2024
-
[7]
A survey of large language models,
W. X. Zhao, K. Zhou, J. Li, T. Tang, Z. Dong, Y . Hou, B. Zhang, Y . Min, J. Zhang, P. Liuet al., “A survey of large language models,”Frontiers of Computer Science, vol. 20, no. 12, p. 2012627, 2026
2026
-
[8]
Understanding the importance of evolutionary search in automated heuristic design with large language models,
R. Zhang, F. Liu, X. Lin, Z. Wang, Z. Lu, and Q. Zhang, “Understanding the importance of evolutionary search in automated heuristic design with large language models,” inInternational Conference on Parallel Problem Solving from Nature. Springer, 2024, pp. 185–202
2024
-
[9]
Automatic definition of modular neural networks,
F. Gruau, “Automatic definition of modular neural networks,”Adaptive behavior, vol. 3, no. 2, pp. 151–183, 1994
1994
-
[10]
Evolving neural networks through augmenting topologies,
K. O. Stanley and R. Miikkulainen, “Evolving neural networks through augmenting topologies,”Evolutionary computation, vol. 10, no. 2, pp. 99–127, 2002
2002
-
[11]
Mathematical discoveries from program search with large language models,
B. Romera-Paredes, M. Barekatain, A. Novikov, M. Balog, M. P. Kumar, E. Dupont, F. J. Ruiz, J. S. Ellenberg, P. Wang, O. Fawziet al., “Mathematical discoveries from program search with large language models,”Nature, vol. 625, no. 7995, pp. 468–475, 2024
2024
-
[12]
Reevo: Large language models as hyper-heuristics with reflective evolution,
H. Ye, J. Wang, Z. Cao, F. Berto, C. Hua, H. Kim, J. Park, and G. Song, “Reevo: Large language models as hyper-heuristics with reflective evolution,”Advances in neural information processing systems, vol. 37, pp. 43 571–43 608, 2024
2024
-
[13]
Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design,
Z. Zheng, Z. Xie, Z. Wang, and B. Hooi, “Monte carlo tree search for comprehensive exploration in llm-based automatic heuristic design,” arXiv preprint arXiv:2501.08603, 2025
-
[14]
Mela: A metacognitive llm-driven architecture for automatic heuristic design,
Z. Qiu, X. Chen, L. Chen, W. Yi, and R. Bai, “Mela: A metacognitive llm-driven architecture for automatic heuristic design,”Expert Systems with Applications, vol. 330, p. 133022, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417426019330
2026
-
[15]
C. H. Waddington,The strategy of the genes. Routledge, 2014
2014
-
[16]
Large language models as optimizers,
C. Yang, X. Wang, Y . Lu, H. Liu, Q. V . Le, D. Zhou, and X. Chen, “Large language models as optimizers,” inInternational Conference on Learning Representations, vol. 2024, 2024, pp. 12 028–12 068
2024
-
[17]
Evoprompt: Connecting llms with evolutionary algorithms yields powerful prompt optimizers,
Q. Guo, R. Wang, J. Guo, B. Li, K. Song, X. Tan, G. Liu, J. Bian, and Y . Yang, “Evoprompt: Connecting llms with evolutionary algorithms yields powerful prompt optimizers,”arXiv e-prints, pp. arXiv–2309, 2023
2023
-
[18]
Hifo-prompt: Prompting with hindsight and foresight for llm-based automatic heuristic design,
C. Chen, M. Zhong, Y . Fan, J. Shi, and J. Sun, “Hifo-prompt: Prompting with hindsight and foresight for llm-based automatic heuristic design,” arXiv preprint arXiv:2508.13333, 2025
-
[19]
Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design,
Y . Liu, J. Li, W. X. Zhao, H. Lu, and J.-R. Wen, “Experience-guided reflective co-evolution of prompts and heuristics for automatic algorithm design,”arXiv preprint arXiv:2509.24509, 2025
-
[20]
A memetic and reflective evolution framework for automatic heuristic design using large language models,
F. Qi, T. Wang, R. Zheng, and M. Li, “A memetic and reflective evolution framework for automatic heuristic design using large language models,” Applied Sciences, vol. 15, no. 15, p. 8735, 2025
2025
-
[21]
M. Malik, J. Zhou, S. R. Chirra, and Z. Cao, “PyVRP +: Llm-driven metacognitive heuristic evolution for hybrid genetic search in vehicle routing problems,”arXiv preprint arXiv:2604.07872, 2026, accepted at AAMAS 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[22]
Eoh-s: Evolution of heuristic set using llms for automated heuristic design,
F. Liu, Y . Liu, Q. Zhang, T. Xialiang, and M. Yuan, “Eoh-s: Evolution of heuristic set using llms for automated heuristic design,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 43, 2026, pp. 37 090–37 098
2026
-
[23]
Llm-driven instance-specific heuristic generation and selection,
S. Zhang, S. Liu, N. Lu, J. Wu, J. Liu, Y .-S. Ong, and K. Tang, “Llm-driven instance-specific heuristic generation and selection,”arXiv preprint arXiv:2506.00490, 2025
-
[24]
Metacognition: A literature review,
E. R. Lai, “Metacognition: A literature review,”Always learning: Pearson research report, vol. 24, pp. 1–40, 2011
2011
-
[25]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V . Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” inAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 24 824–24 837. [Onlin...
2022
-
[26]
Tree of thoughts: Deliberate problem solving with large language models,
S. Yao, D. Yu, J. Zhao, I. Shafran, T. Griffiths, Y . Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,”Advances in neural information processing systems, vol. 36, pp. 11 809–11 822, 2023
2023
-
[27]
Think, reflect, create: Metacognitive learning for zero-shot robotic planning with llms,
W. Lin, J. Wei-Kocsis, J. Zhang, B.-C. Min, D. Gan, P. Asunda, and R. Athinarayanan, “Think, reflect, create: Metacognitive learning for zero-shot robotic planning with llms,”arXiv preprint arXiv:2505.14899, 2025
-
[28]
Metacognitive prompting improves understand- ing in large language models,
Y . Wang and Y . Zhao, “Metacognitive prompting improves understand- ing in large language models,” inProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024, pp. 1914–1926
2024
-
[29]
Adaptive tool use in large language models with meta- cognition trigger,
W. Li, D. Li, K. Dong, C. Zhang, H. Zhang, W. Liu, Y . Wang, R. Tang, and Y . Liu, “Adaptive tool use in large language models with meta- cognition trigger,” inProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025, pp. 13 346–13 370
2025
-
[30]
Particle swarm optimization,
J. Kennedy and R. Eberhart, “Particle swarm optimization,” inProceed- ings of ICNN’95-international conference on neural networks, vol. 4. ieee, 1995, pp. 1942–1948
1995
-
[31]
On the exploration and exploitation in popular swarm-based metaheuristic algorithms,
K. Hussain, M. N. M. Salleh, S. Cheng, and Y . Shi, “On the exploration and exploitation in popular swarm-based metaheuristic algorithms,” Neural Computing and Applications, vol. 31, no. 11, pp. 7665–7683, 2019
2019
-
[32]
Genetic algorithms,
J. H. Holland, “Genetic algorithms,”Scientific american, vol. 267, no. 1, pp. 66–73, 1992
1992
-
[33]
A hypercube-based en- coding for evolving large-scale neural networks,
K. O. Stanley, D. B. D’Ambrosio, and J. Gauci, “A hypercube-based en- coding for evolving large-scale neural networks,”Artificial life, vol. 15, no. 2, pp. 185–212, 2009
2009
-
[34]
Hyper- heuristics: learning to combine simple heuristics in bin-packing prob- lems,
P. Ross, S. Schulenburg, J. G. Mar ´ın-Bl¨azquez, and E. Hart, “Hyper- heuristics: learning to combine simple heuristics in bin-packing prob- lems,” inProceedings of the 4th annual conference on genetic and evolutionary computation, 2002, pp. 942–948
2002
-
[35]
A comparative analysis of metaheuristics applied to adaptive curriculum sequencing,
A. F. Martins, M. Machado, H. S. Bernardino, and J. F. de Souza, “A comparative analysis of metaheuristics applied to adaptive curriculum sequencing,”Soft Computing, vol. 25, no. 16, pp. 11 019–11 034, 2021
2021
-
[36]
Ant colony optimization,
M. Dorigo, M. Birattari, and T. Stutzle, “Ant colony optimization,”IEEE computational intelligence magazine, vol. 1, no. 4, pp. 28–39, 2006. 14
2006
-
[37]
Sand cat swarm optimization: A nature- inspired algorithm to solve global optimization problems,
A. Seyyedabbasi and F. Kiani, “Sand cat swarm optimization: A nature- inspired algorithm to solve global optimization problems,”Engineering with computers, vol. 39, no. 4, pp. 2627–2651, 2023
2023
-
[38]
Metaheuristic-based adaptive curricu- lum sequencing approaches: a systematic review and mapping of the literature,
M. d. O. C. Machado, N. F. S. Bravo, A. F. Martins, H. S. Bernardino, E. Barrere, and J. F. d. Souza, “Metaheuristic-based adaptive curricu- lum sequencing approaches: a systematic review and mapping of the literature,”Artificial Intelligence Review, vol. 54, no. 1, pp. 711–754, 2021
2021
-
[39]
Optimizing k-coverage in energy-saving wireless sensor networks based on the elite global growth optimizer,
L. Chen, Z. Qiu, Y . Wu, and Z. Tang, “Optimizing k-coverage in energy-saving wireless sensor networks based on the elite global growth optimizer,”Expert Systems with Applications, vol. 256, p. 124878, 2024
2024
-
[40]
An effective heuristic algorithm for the traveling-salesman problem,
S. Lin and B. W. Kernighan, “An effective heuristic algorithm for the traveling-salesman problem,”Operations research, vol. 21, no. 2, pp. 498–516, 1973
1973
-
[41]
An analysis of several heuristics for the traveling salesman problem,
D. J. Rosenkrantz, R. E. Stearns, and P. M. Lewis, II, “An analysis of several heuristics for the traveling salesman problem,”SIAM Journal on Computing, vol. 6, no. 3, pp. 563–581, 1977
1977
-
[42]
Pomo: Policy optimization with multiple optima for reinforcement learning,
Y .-D. Kwon, J. Choo, B. Kim, I. Yoon, Y . Gwon, and S. Min, “Pomo: Policy optimization with multiple optima for reinforcement learning,” Advances in neural information processing systems, vol. 33, pp. 21 188– 21 198, 2020
2020
-
[43]
Genetic programming as a means for programming com- puters by natural selection,
J. R. Koza, “Genetic programming as a means for programming com- puters by natural selection,”Statistics and computing, vol. 4, no. 2, pp. 87–112, 1994
1994
-
[44]
Genetic programming: An introductory tutorial and a survey of techniques and applications,
R. Poli, W. B. Langdon, N. F. McPhee, and J. R. Koza, “Genetic programming: An introductory tutorial and a survey of techniques and applications,”Univ. Essex School of Computer Science and Eletronic Engineering Technical Report No. CES-475, pp. 1–112, 2007
2007
-
[45]
On a test of whether one of two random variables is stochastically larger than the other,
H. B. Mann and D. R. Whitney, “On a test of whether one of two random variables is stochastically larger than the other,”The annals of mathematical statistics, pp. 50–60, 1947. 15 APPENDIX A. Consolidated Notation and Symbol Definitions TABLE A1: Consolidated Notation and Symbol Definitions Symbol Definition Problem and evaluation PTarget optimization pro...
1947
-
[46]
Strict Requirements for Parameter Access: - All parameters must be accessed using OBJECT.ATTRIBUTE notation only - NEVER use dictionary-style access like data al[’lb’] or data al[0] - Required parameter accesses that must use dot notation: * data al.lb - lower bounds * data al.ub - upper bounds * data al.dim - problem dimension * data al.SearchAgents - po...
-
[47]
Parameter initialization (MUST use dot notation) lb = np.array(data al.lb) ub = np.array(data al.ub) dim = data al.dim SearchAgents no = data al.SearchAgents # 2
Mandatory Implementation Structure: # 1. Parameter initialization (MUST use dot notation) lb = np.array(data al.lb) ub = np.array(data al.ub) dim = data al.dim SearchAgents no = data al.SearchAgents # 2. Position initialization ub array = np.array(ub) lb array = np.array(lb) position initialization logic
-
[48]
Core Algorithm Requirements: - Must implement both exploration and exploitation layers - Must include boundary constraint handling - Must use cosine-based position updates - Must maintain roulette wheel selection
-
[49]
You are an expert in optimization algorithm design, metaheuristics, and computational complexity analysis
Input/Output Specifications: Input Parameters: - data al: Algorithm config object with dot-accessible attributes - data pb: Problem data object - Positions: Current population positions - Best pos: Current best solution - Best score: Current best fitness - rg: Current search radius Returns: - Updated Positions array only - NO other return values allowed H...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.