Simultaneous Model-Based Evolution of Constants and Expression Structure in GP-GOMEA for Symbolic Regression
Pith reviewed 2026-06-28 11:59 UTC · model grok-4.3
The pith
Merging real-valued GOMEA with GP-GOMEA optimizes constants and structure simultaneously for better symbolic regression.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that merging the real-valued variant of GOMEA with GP-GOMEA enables simultaneous optimization of constants and expression structure, and that this integrated method generally performs best compared to other forms of handling constants such as linear scaling, restarts, and constant tuning after GP optimization.
What carries the argument
The merged GP-GOMEA that performs simultaneous model-based evolution of constants and expression structure.
If this is right
- The simultaneous method outperforms variants that use ephemeral random constants or tune constants only after evolution.
- Joint optimization of structure and constants produces more accurate expressions while keeping them compact.
- The approach works well when combined with techniques such as linear scaling and restarts.
- Results confirm that well-integrated handling of mixed discrete-continuous variables improves outcomes on symbolic regression.
Where Pith is reading between the lines
- The same simultaneous-optimization principle could be tested in other evolutionary algorithms that mix discrete structures with continuous parameters.
- Real-world applications in scientific modeling might benefit from fewer separate tuning phases if the joint method scales to noisy or high-dimensional data.
- Other genetic programming systems that currently fix constants early could be re-examined to see whether co-evolution reduces the need for post-processing.
Load-bearing premise
That interactions between expression structure and constants are strong enough that separate optimization steps will miss better solutions.
What would settle it
Applying the merged algorithm and the strongest non-simultaneous constant-handling variant to the same benchmark problems and finding no accuracy advantage for the merged version on most cases.
Figures
read the original abstract
Genetic programming (GP) approaches are among the state-of-the-art for symbolic regression, the task of constructing symbolic expressions that fit well with data. To find highly accurate symbolic expressions, both the expression structure and any contained real-valued constants, are important. GP-GOMEA, a modern model-based evolutionary algorithm, is one of the leading algorithms for finding accurate, yet compact expressions. Yet, GP-GOMEA does not perform dedicated constant optimization, but rather uses ephemeral random constants. Hence, the accuracy of GP-GOMEA may well still be improved upon by the incorporation of a constant optimization mechanism. Existing research into mixed discrete-continuous optimization with EAs has shown that a simultaneous and well-integrated approach to optimizing both discrete and continuous parts, leads to the best results on a variety of problems, especially when there are interactions between these parts. In this paper, we therefore propose a novel approach where constants in expressions are optimized at the same time as the expression structure by merging the real-valued variant of GOMEA with GP-GOMEA. The proposed approach is compared to other forms of handling constants in GP-GOMEA, and in the context of other commonly used techniques such as linear scaling, restarts, and constant tuning after GP optimization. Our results indicate that our novel approach generally performs best and confirms the importance of simultaneous constant optimization during evolution.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes extending GP-GOMEA by merging it with the real-valued GOMEA variant, enabling simultaneous model-based evolution of both expression structure (discrete) and constants (continuous). It empirically compares this integrated approach against ephemeral random constants, post-evolution constant tuning, linear scaling, and restarts on symbolic regression tasks, concluding that the novel simultaneous method generally performs best and underscoring the value of joint optimization when interactions exist between structure and constants.
Significance. If the empirical results hold under scrutiny, the work provides concrete evidence supporting integrated discrete-continuous optimization in model-based evolutionary algorithms for symbolic regression, a domain where constant accuracy directly affects model quality. It builds directly on established GOMEA linkage-learning machinery without introducing new free parameters or self-referential definitions, and the experimental design explicitly contrasts simultaneous versus sequential constant handling.
minor comments (3)
- Abstract and §4 (results): the statement that the novel approach 'generally performs best' should be accompanied by explicit reporting of the number of benchmarks, the statistical tests applied (e.g., Wilcoxon or Friedman with post-hoc correction), and effect-size measures; without these, the strength of the central empirical claim is difficult to assess from the summary tables alone.
- §3.2 (method): the description of how the real-valued linkage model is merged with the GP-GOMEA dependency model would benefit from a short pseudocode fragment or diagram illustrating the joint sampling step, to make the integration reproducible.
- Table captions and §4: ensure every table reports the exact number of independent runs and any parameter settings that differ from the cited GP-GOMEA baseline, to avoid ambiguity when readers attempt replication.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. The feedback affirms the contribution of simultaneous discrete-continuous optimization via the merged GOMEA variants. No specific major comments were provided in the report.
Circularity Check
Minor self-citation on GOMEA background; empirical claim independent
full rationale
The paper's core contribution is an empirical comparison of constant-handling variants (ephemeral random constants, post-tuning, linear scaling, restarts, and the proposed simultaneous model-based optimization) on symbolic regression benchmarks. The performance claim is grounded in experimental results rather than any derivation that reduces to fitted inputs or self-referential definitions. The background statement on mixed discrete-continuous EAs is presented as established prior work and does not serve as a load-bearing uniqueness theorem or ansatz for the reported outcomes. No equations or algorithmic steps in the provided text exhibit self-definitional, fitted-prediction, or renaming circularity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Existing research into mixed discrete-continuous optimization with EAs has shown that a simultaneous and well-integrated approach to optimizing both discrete and continuous parts leads to the best results on a variety of problems, especially when there are interactions between these parts.
Reference graph
Works this paper leans on
-
[1]
In: 2009 21st IEEE Interna- tional Conference on Tools with Artificial Intelligence
Alonso, C.L., Monta˜ na, J.L., Borges, C.E.: Evolution strategies for con- stants optimization in genetic programming. In: 2009 21st IEEE Interna- tional Conference on Tools with Artificial Intelligence. pp. 703–707 (2009). https://doi.org/10.1109/ICTAI.2009.35
-
[2]
https://doi.org/10.24432/C51307, https://archive.ics.uci.edu/dataset/242
Athanasios Tsanas, A.X.: Energy efficiency (2012). https://doi.org/10.24432/C51307, https://archive.ics.uci.edu/dataset/242
-
[3]
Benavoli, A., Corani, G., Demˇ sar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(1), 2653–2688 (jan 2017)
2017
-
[4]
In: Proceedings of the Genetic and Evolutionary Computation Conference
Bouter, A., Alderliesten, T., Witteveen, C., Bosman, P.A.N.: Exploiting linkage information in real-valued optimization with the real-valued gene-pool optimal mixing evolutionary algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference. p. 705–712. GECCO ’17, Association for Computing Machinery, New York, NY, USA (2017). https://d...
-
[5]
In: Proceedings of the 10th an- nual conference on Genetic and evolutionary computation
Cerny, B.M., Nelson, P.C., Zhou, C.: Using differential evolution for symbolic regression and numerical constant creation. In: Proceedings of the 10th an- nual conference on Genetic and evolutionary computation. GECCO08, ACM (Jul 2008). https://doi.org/10.1145/1389095.1389331, http://dx.doi.org/10.1145/ 1389095.1389331
-
[6]
https://doi.org/10.48550/ARXIV.2109.05259, https://arxiv.org/abs/2109.05259
Dushatskiy, A., Virgolin, M., Bouter, A., Thierens, D., Bosman, P.A.N.: Parameterless gene-pool optimal mixing evolutionary algorithms (2021). https://doi.org/10.48550/ARXIV.2109.05259, https://arxiv.org/abs/2109.05259
-
[7]
On the Interpretation of chi-square from Contingency Tables, and the Calculation of P
Fisher, R.A.: On the interpretation ofX 2 from contingency tables, and the cal- culation of p. Journal of the Royal Statistical Society85(1), 87 (Jan 1922). https://doi.org/10.2307/2340521, http://dx.doi.org/10.2307/2340521
-
[8]
Information Processing Letters104(6), 205–210 (Dec 2007)
Gronau, I., Moran, S.: Optimal implementations of upgma and other common clustering algorithms. Information Processing Letters104(6), 205–210 (Dec 2007). https://doi.org/10.1016/j.ipl.2007.07.002, http://dx.doi.org/10.1016/j.ipl.2007.07. 002
-
[9]
Harrison, J., Virgolin, M., Alderliesten, T., Bosman, P.: Mini-batching, gradient- clipping, first- versus second-order: What works in gradient-based coefficient opti- misation for symbolic regression? In: Proceedings of the Genetic and Evolutionary Computation Conference. p. 1127–1136. GECCO ’23, Association for Computing Machinery, New York, NY, USA (20...
-
[10]
IEEE Expert10(3), 11–15 (Jun 1995)
Howard, L., D’Angelo, D.: The ga-p: a genetic algorithm and genetic programming hybrid. IEEE Expert10(3), 11–15 (Jun 1995). https://doi.org/10.1109/64.393137, http://dx.doi.org/10.1109/64.393137
-
[11]
https://doi.org/10.24432/C5PK67, https://archive.ics.uci.edu/dataset/165
I-Cheng Yeh: Concrete compressive strength (1998). https://doi.org/10.24432/C5PK67, https://archive.ics.uci.edu/dataset/165
-
[12]
Gerritsma, R.O.: Yacht hydrodynamics (1981)
J. Gerritsma, R.O.: Yacht hydrodynamics (1981). https://doi.org/10.24432/C5XG7R, https://archive.ics.uci.edu/dataset/243
-
[13]
Genetic Program- ming and Evolvable Machines5(3), 259–269 (Sep 2004)
Keijzer, M.: Scaled symbolic regression. Genetic Program- ming and Evolvable Machines5(3), 259–269 (Sep 2004). https://doi.org/10.1023/b:genp.0000030195.77571.f9, http://dx.doi.org/10. 1023/B:GENP.0000030195.77571.f9 16 J. Koch et al
-
[14]
Ge- netic Programming and Evolvable Machines21(3), 471–501 (Dec 2019)
Kommenda, M., Burlacu, B., Kronberger, G., Affenzeller, M.: Parameter identification for symbolic regression using nonlinear least squares. Ge- netic Programming and Evolvable Machines21(3), 471–501 (Dec 2019). https://doi.org/10.1007/s10710-019-09371-3, http://dx.doi.org/10.1007/ s10710-019-09371-3
-
[15]
Statistics and Computing4(2) (Jun 1994)
Koza, J.: Genetic programming as a means for programming comput- ers by natural selection. Statistics and Computing4(2) (Jun 1994). https://doi.org/10.1007/bf00175355, http://dx.doi.org/10.1007/BF00175355
-
[16]
Kronberger, G.: Local optimization often is ill-conditioned in genetic program- ming for symbolic regression. In: 2022 24th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC). pp. 304–310 (2022). https://doi.org/10.1109/SYNASC57785.2022.00055
-
[17]
In: Vanschoren, J., Yeung, S
La Cava, W., Orzechowski, P., Burlacu, B., de Franca, F., Virgolin, M., Jin, Y., Kommenda, M., Moore, J.: Contemporary symbolic regression methods and their relative performance. In: Vanschoren, J., Yeung, S. (eds.) Proceedings of the Neu- ral Information Processing Systems Track on Datasets and Benchmarks. vol. 1. Curran (2021), https://datasets-benchmar...
2021
-
[18]
In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vish- wanathan, S., Garnett, R
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predic- tions. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vish- wanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Sys- tems 30, pp. 4765–4774. Curran Associates, Inc. (2017), http://papers.nips.cc/ paper/7062-a-unified-approach-to-interpret...
2017
-
[19]
PeerJ Computer Science3, e103 (Jan 2017)
Meurer, A., Smith, C.P., Paprocki, M., ˇCert´ ık, O., Kirpichev, S.B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J.K., Singh, S., Rathnayake, T., Vig, S., Granger, B.E., Muller, R.P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M.J., Terrel, A.R., Rouˇ cka, v., Saboo, A., Fernando, I., Kulal, S., Cimrman, R., Scopatz, A.: Sympy...
-
[20]
In: Proceedings of the 14th An- nual Conference Companion on Genetic and Evolutionary Computation
Mukherjee, S., Eppstein, M.J.: Differential evolution of constants in genetic programming improves efficacy and bloat. In: Proceedings of the 14th An- nual Conference Companion on Genetic and Evolutionary Computation. p. 625–626. GECCO ’12, Association for Computing Machinery, New York, NY, USA (2012). https://doi.org/10.1145/2330784.2330891, https://doi....
-
[21]
Genetic Programming and Evolvable Machines22(1), 73–100 (May 2020)
Nicolau, M., Agapitos, A.: Choosing function sets with better generalisation per- formance for symbolic regression models. Genetic Programming and Evolvable Machines22(1), 73–100 (May 2020). https://doi.org/10.1007/s10710-020-09391-4, http://dx.doi.org/10.1007/s10710-020-09391-4
-
[22]
Curran Associates Inc., Red Hook, NY, USA (2019)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., K¨ opf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: PyTorch: an imperative style, high-performance deep learning library. Curran Associates Inc., Red H...
2019
-
[23]
Genetic Programming and Evolvable Machines23(1), 37–69 (Aug 2021)
Rockett, P.: Constant optimization and feature standardization in multiobjec- tive genetic programming. Genetic Programming and Evolvable Machines23(1), 37–69 (Aug 2021). https://doi.org/10.1007/s10710-021-09410-y, http://dx.doi. org/10.1007/s10710-021-09410-y Model-based Evolutionary Constant Optimization in GP-GOMEA 17
-
[24]
Rudin, C.: Stop explaining black box machine learning models for high stakes de- cisions and use interpretable models instead. Nature Machine Intelligence1(5), 206–215 (May 2019). https://doi.org/10.1038/s42256-019-0048-x, http://dx.doi. org/10.1038/s42256-019-0048-x
-
[25]
Evolutionary Compu- tation26(1), 117–143 (2018)
Sadowski, K.L., Thierens, D., Bosman, P.A.: Gambit: A parameterless model- based evolutionary algorithm for mixed-integer problems. Evolutionary Compu- tation26(1), 117–143 (2018). https://doi.org/10.1162/evco a 00206
-
[26]
Sharman, K.: Evolving signal processing algorithms by genetic program- ming. In: 1st International Conference on Genetic Algorithms in Engi- neering Systems: Innovations and Applications (GALESIA). IEE (1995). https://doi.org/10.1049/cp:19951094, http://dx.doi.org/10.1049/cp:19951094
-
[27]
In: Proceedings of the Genetic and Evolutionary Computation Conference
Sijben, E.M.C., Alderliesten, T., Bosman, P.A.N.: Multi-modal multi-objective model-based genetic programming to find multiple diverse high-quality models. In: Proceedings of the Genetic and Evolutionary Computation Conference. p. 440–448. GECCO ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3512290.3528850, ht...
-
[28]
https://doi.org/10.24432/C5VW2C, https://archive.ics.uci.edu/dataset/291
Thomas Brooks, D.P.: Airfoil self-noise (1989). https://doi.org/10.24432/C5VW2C, https://archive.ics.uci.edu/dataset/291
-
[29]
Evolu- tionary Computation29(2), 211–237 (2021)
Virgolin, M., Alderliesten, T., Witteveen, C., Bosman, P.A.N.: Improving model- based genetic programming for symbolic regression of small expressions. Evolu- tionary Computation29(2), 211–237 (2021)
2021
-
[30]
InProceedings of the Genetic and Evolutionary Computation Conference Companion (GECCO ’22)
Virgolin, M., Bosman, P.A.N.: Coefficient mutation in the gene-pool opti- mal mixing evolutionary algorithm for symbolic regression. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. p. 2289–2297. GECCO ’22, Association for Computing Machinery, New York, NY, USA (2022). https://doi.org/10.1145/3520304.3534036, https://doi.o...
-
[31]
https://doi.org/10.48550/ARXIV.2204.02046, https://arxiv.org/abs/2204
Virgolin, M., Medvet, E., Alderliesten, T., Bosman, P.A.N.: Less is more: A call to focus on simpler models in genetic programming for interpretable machine learning (2022). https://doi.org/10.48550/ARXIV.2204.02046, https://arxiv.org/abs/2204. 02046
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.