arxiv: 2604.24968 · v1 · submitted 2026-04-27 · 💻 cs.NE

Recognition: unknown

The Effects of Population Size on the Performance of BEAGLE GPU-Based Genetic Programming Runs

Nathan Haut , Ilya Basin , Ruchika Gupta , Marzieh Kianinejad , Zachary Perrico , Elijah Smith , Wolfgang Banzhaf

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:07 UTC · model grok-4.3

classification 💻 cs.NE

keywords genetic programmingpopulation sizeGPUsymbolic regressionBEAGLEevolutionary searchstepped populations

0 comments

The pith

GPU genetic programming succeeds with problem-dependent population sizes from 1000 to 10 million.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explores the effects of population size on the performance of BEAGLE, a GPU-based genetic programming system, when applied to symbolic regression problems. It shows that for some problems, very small constant populations like 1000 individuals enable effective narrow and deep searches. For other problems, very large constant populations up to 10 million individuals enable effective broad and shallow searches. Stepped population sizes that start large and reduce to small are also tested to balance these aspects of search. A sympathetic reader would care because this reveals how GPU hardware changes the optimal configuration for evolutionary search methods.

Core claim

The authors establish that constant population sizes in BEAGLE GPU genetic programming for symbolic regression yield benefits from narrow deep searches with as few as 1000 individuals for some problems and from broad shallow searches with as many as 10 million individuals for others, while stepped population sizes starting large and decreasing offer a means to balance breadth and depth of search.

What carries the argument

Varying constant and stepped population sizes within the BEAGLE GPU framework for genetic programming, which enables testing extreme scales of population dynamics previously unattainable on CPUs.

Load-bearing premise

The performance differences across population sizes result primarily from the population size and associated search breadth or depth rather than from confounding factors like GPU implementation details or other hyperparameters.

What would settle it

Re-running the symbolic regression experiments with the same population size variations but on a CPU-based genetic programming system to check whether the benefits of narrow versus broad searches hold without GPU-specific factors.

Figures

Figures reproduced from arXiv: 2604.24968 by Elijah Smith, Ilya Basin, Marzieh Kianinejad, Nathan Haut, Ruchika Gupta, Wolfgang Banzhaf, Zachary Perrico.

**Figure 1.** Figure 1: Shown here is the distribution of actual population sizes across all generations, runs, and experiments that used a target population size of 1 million. We can see that the actual population size falls within a normal distribution around the target size. 5.2 Constant Population Sizes The results of running constant population sizes ranging from small populations of 100 to large populations of 100 million i… view at source ↗

**Figure 2.** Figure 2: Table 2 shows how many times one of the benchmark problems was view at source ↗

**Figure 2.** Figure 2: Shown are the performance results on each equation using the different population sizes. The bar height indicates the percentage of solved problems among the 30 repeats. Confidence bands are computed using the Wilson Interval. much benefit from using the GPU since its massive compute power is severely underutilized for such small population. 5.3 Stepped Population Sizes In view at source ↗

**Figure 3.** Figure 3: Shown here are the number of generations completed using each population size setup for all 30 runs across the 7 benchmark problems. The results show that the scaling is approximately linear, so there is an inverse linear relationship between population size and generations. 100 1K 10K 100K 1M 10M 100M 5×107 1×108 5×108 1×109 Pop Size Models Evaluated Model Evaluations vs Pop Size Scaling view at source ↗

**Figure 4.** Figure 4: Shown here are the total number of model evaluations over the population size setup across all 30 runs for all 7 benchmark problems. We can see that, regardless of population size, the number of model evaluations is comparable. 1.5 million and 3-4 generations when going from 5 million to 100,000. This shows that population size changes are more of a smooth transition than a hard step in this selection appr… view at source ↗

**Figure 5.** Figure 5: Population size changes with a step in population size target is triggered after generation 19. Left: Step-down from 5 million to 1.5 million target size; Right: Stepdown from 5 million to 100,000 target size. The results of running the stepped population sizes are shown in view at source ↗

**Figure 6.** Figure 6: Shown here are the performance results (solved %) of each stepped strategy across all 7 benchmark problems. Acknowledgments. The team acknowledges Noblis, Inc. for supporting the development of the Beagle framework. This work was supported in part by Michigan State University through computational resources provided by the Institute for Cyber-Enabled Research view at source ↗

read the original abstract

The Beagle framework, through GPU-based Genetic Programming, enables population dynamics previously unattainable (within practical time frames) by CPU-constrained Genetic Programming systems. This work explores how GPU-enabled population sizes impact the success of training for symbolic regression problems. Specifically, when using constant population sizes, we see benefits of using very narrow and deep searches (as narrow as 1000 individuals) for some problems, while other problems benefit from very broad and shallow searches (as broad as 10 million individuals). We also explore stepped population sizes that start with large populations and drop to small populations to balance the breadth and depth of search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tests extreme population sizes up to 10M in BEAGLE GPU GP for symbolic regression and reports problem-specific preferences for narrow-deep or broad-shallow search, plus stepped schedules, but the design likely confounds population size with total evaluations.

read the letter

This paper tests how population size affects training success in the BEAGLE GPU-based genetic programming system on symbolic regression problems. With constant sizes they find some problems do better with small populations around 1000 run deep, while others benefit from populations up to 10 million run shallow, and they also try stepped schedules that start large and drop smaller to balance the two.

Referee Report

2 major / 0 minor

Summary. The paper presents an empirical study using the BEAGLE GPU-based Genetic Programming framework to examine how population size affects training success on symbolic regression problems. It reports that constant population sizes produce problem-dependent outcomes, with some problems favoring narrow and deep searches (down to 1000 individuals) and others favoring broad and shallow searches (up to 10 million individuals). The work additionally explores stepped population size schedules that begin large and decrease over time to balance search breadth and depth.

Significance. If the performance differences can be shown to arise specifically from population size after isolating total computational budget, the findings would offer practical guidance for configuring large-scale GP runs on GPUs. This could inform problem-specific choices between deep versus wide search and the utility of dynamic population schedules, potentially improving efficiency in evolutionary computation applications.

major comments (2)

[Abstract] Abstract: the reported benefits of narrow/deep (1000) versus broad/shallow (10M) searches cannot be cleanly attributed to population size. If generations are held fixed while population size varies, total fitness evaluations scale linearly with population size; the manuscript gives no indication that generations were adjusted, that results were normalized by total evaluations, or that fixed-budget controls were performed. This confound is load-bearing for the central claim that search shape (breadth vs depth) drives the problem-specific differences.
[Experimental description] Experimental description: the abstract and summary provide no details on the number of independent runs, the specific symbolic regression benchmarks, statistical tests employed, or error bars/variance measures. Without these, it is impossible to assess whether the observed preferences for narrow versus broad searches are statistically reliable or reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped clarify key aspects of our experimental design and presentation. We have revised the manuscript to address the concerns about computational budget controls and methodological details, strengthening the attribution of results to population size effects in GPU-based GP.

read point-by-point responses

Referee: [Abstract] Abstract: the reported benefits of narrow/deep (1000) versus broad/shallow (10M) searches cannot be cleanly attributed to population size. If generations are held fixed while population size varies, total fitness evaluations scale linearly with population size; the manuscript gives no indication that generations were adjusted, that results were normalized by total evaluations, or that fixed-budget controls were performed. This confound is load-bearing for the central claim that search shape (breadth vs depth) drives the problem-specific differences.

Authors: We agree that isolating the effect of search shape from total computational budget is essential. Our original experiments fixed the number of generations (detailed in Section 3) to examine how GPU-enabled population sizes affect parallel search breadth versus depth in practical runtimes. To directly address the potential confound, we have added fixed-budget experiments in a new subsection of the results, where generations are scaled inversely with population size to hold total fitness evaluations constant. These controls confirm that problem-specific preferences for narrow/deep versus broad/shallow searches persist, supporting our claims while ruling out simple evaluation-count explanations. revision: yes
Referee: [Experimental description] Experimental description: the abstract and summary provide no details on the number of independent runs, the specific symbolic regression benchmarks, statistical tests employed, or error bars/variance measures. Without these, it is impossible to assess whether the observed preferences for narrow versus broad searches are statistically reliable or reproducible.

Authors: We appreciate the need for these details to support reproducibility. The full paper's Experimental Setup section specifies 30 independent runs per configuration, the exact symbolic regression benchmarks (Nguyen, Keijzer, and Vladislavleva suites), Wilcoxon rank-sum tests for significance, and results with mean ± standard deviation. To make this immediately accessible, we have updated the abstract and added a concise methods summary table in the introduction. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical reporting of experimental observations

full rationale

The paper conducts and reports GPU-based genetic programming experiments varying population size (constant and stepped) on symbolic regression problems. No mathematical derivations, first-principles predictions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described content. All claims reduce directly to measured run outcomes rather than to any internal definition or prior author result by construction. The skeptic concern about evaluation budget confounding is a potential experimental-design issue but does not constitute circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The study is an empirical investigation that relies on the correctness of the BEAGLE GPU implementation and standard genetic programming mechanisms without introducing new theoretical constructs.

free parameters (1)

population_size
Tested values (1000 to 10 million) and stepped schedules are experimental choices varied to observe effects rather than fitted parameters.

axioms (1)

domain assumption The BEAGLE framework correctly implements standard genetic programming operators and GPU-accelerated evaluation
The paper builds directly on this framework without re-deriving or verifying its core mechanisms.

pith-pipeline@v0.9.0 · 5420 in / 1332 out tokens · 44995 ms · 2026-05-07T17:07:30.973000+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 2 canonical work pages

[1]

Arora, R

R. Arora, R. Tulshyan, and K. Deb. Parallelization of binary and real-coded genetic algorithms on gpu using cuda. In IEEE Congress on Evolutionary Computation , pages 1–8. IEEE, 2010

2010
[2]

Baeta, J

F. Baeta, J. Correia, T. Martins, and P. Machado. Tensorgp–genetic programming engine in tensorﬂow. In International Conference on the Applications of Evolu- tionary Computation (Part of EvoStar) , pages 763–778. Springer, 2021

2021
[3]

D. M. Chitty. Fast parallel genetic programming: Multi-core CPU versus many- core GPU. Soft Computing , 16(10):1795–1814, 2012

2012
[4]

L. Fan, Z. Su, and X. Liu. An asynchronous parallel symbolic regression approach based on multiobjective genetic programming with GPU acceleration. Applied Soft Computing, page 115010, 2026

2026
[5]

Farinati and L

D. Farinati and L. Vanneschi. A survey on dynamic populations in bio-inspired algorithms. Genetic Programming and Evolvable Machines , 25(2):19, 2024

2024
[6]

Ferigo and G

A. Ferigo and G. Iacca. A gpu-enabled compact genetic algorithm for very large- scale optimization problems. Mathematics, 8(5):758, 2020

2020
[7]

Fok, T.-T

K.-L. Fok, T.-T. Wong, and M.-L. Wong. Evolutionary computing on consumer graphics hardware. IEEE Intelligent Systems , 22(2):69–78, 2007

2007
[8]

Gagné, M

C. Gagné, M. Parizeau, and M. Dubreuil. Distributed beagle: An environment for parallel and distributed evolutionary computations. In Proc. of the 17th Annual International Symposium on High Performance Computing Systems and Applica- tions (HPCS) , volume 2003, pages 201–208, 2003

2003
[9]

Harding and W

S. Harding and W. Banzhaf. Fast Genetic Programming on GPUs. In Euro- pean Conference on Genetic Programming (EuroGP 2007) , pages 90–101. Springer, 2007

2007
[10]

Harding and W

S. Harding and W. Banzhaf. Genetic programming on gpus for image processing. International Journal of High Performance Systems Architecture , 1(4):231–240, 2008

2008
[11]

N. Haut, W. Banzhaf, and B. Punch. Correlation Versus RMSE Loss Functions in Symbolic Regression Tasks, pages 31–55. Springer Nature Singapore, Singapore, 2023

2023
[12]

N. Haut, I. Basin, M. Kianinejad, R. Gupta, E. Smith, Z. Perrico, and W. Banzhaf. Gpu-accelerated genetic programming for symbolic regression with beagle frame- work. arXiv preprint arXiv:2603.12292 , 2026

work page arXiv 2026
[13]

Hu and W

T. Hu and W. Banzhaf. The role of population size in rate of evolution in genetic programming. In European Conference on Genetic Programming (EuroGP 2009) , pages 85–96. Springer, 2009

2009
[14]

T. Hu, S. Harding, and W. Banzhaf. Variable population size and evolution accel- eration: A case study with a parallel evolutionary algorithm. Genetic Programming and Evolvable Machines , 11(2):205–225, 2010

2010
[15]

Kouchakpour, A

P. Kouchakpour, A. Zaknich, and T. Bräunl. Dynamic population variation in genetic programming. Information Sciences , 179(8):1078–1091, 2009

2009
[16]

J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection . MIT Press, Cambridge, MA, USA, 1992

1992
[17]

W. B. Langdon. Graphics processing units and genetic programming: an overview. Soft computing , 15(8):1657–1669, 2011

2011
[18]

W. B. Langdon and W. Banzhaf. A simd interpreter for genetic programming on gpu graphics cards. In European Conference on Genetic Programming (EuroGP 2008), pages 73–85. Springer, 2008. 16 N. Haut et al

2008
[19]

Lange, Y

R. Lange, Y. Tang, and Y. Tian. Neuroevobench: Benchmarking evolutionary op- timizers for deep learning applications. Advances in Neural Information Processing Systems, 36:32160–32172, 2023

2023
[20]

Noblis. Beagle. https://github.com/Noblis/beagle-v1.x, 2026

2026
[21]

Oh and K

K.-S. Oh and K. Jung. Gpu implementation of neural networks. Pattern Recogni- tion, 37(6):1311–1314, 2004

2004
[22]

Pandey, M

M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov. The transformational role of gpu computing and deep learning in drug discovery. Nature Machine Intelligence , 4(3):211–221, 2022

2022
[23]

Robilliard, V

D. Robilliard, V. Marion, and C. Fonlupt. High-performance Genetic Programming on GPU. In Proceedings of the 2009 Workshop on Bio-inspired Algorithms for Distributed Systems , pages 85–94, 2009

2009
[24]

Sathia, V

V. Sathia, V. Ganesh, and S. R. T. Nanditale. Accelerating genetic programming using gpus. arXiv preprint arXiv:2110.11226 , 2021

work page arXiv 2021
[25]

Staats, E

K. Staats, E. Pantridge, M. Cavaglia, I. Milovanov, and A. Aniyan. Tensorﬂow enabled genetic programming. In Proceedings of the genetic and evolutionary com- putation conference companion, pages 1872–1879, 2017

2017
[26]

Steinkrau, P

D. Steinkrau, P. Y. Simard, and I. Buck. Using gpus for machine learning al- gorithms. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, pages 1115–1119, 2005

2005
[27]

Y. Tao, M. Li, and J. Cao. A new dynamic population variation in genetic pro- gramming. Computing and Informatics , 32(1):63–87, 2013

2013
[28]

M. Tegmark. Welcome to the Feynman Symbolic Regression Database!
[29]

Tomassini, L

M. Tomassini, L. Vanneschi, J. Cuendet, and F. Fernández. A new technique for dynamic size populations in genetic programming. In Proceedings of the 2004 Congress on Evolutionary Computation , volume 1, pages 486–493. IEEE, 2004

2004
[30]

Trujillo, J

L. Trujillo, J. M. M. Contreras, D. E. Hernandez, M. Castelli, and J. J. Tapia. GSGP-CUDA A CUDA Framework for Geometric Semantic Genetic Program- ming. SoftwareX, 18:101085, 2022

2022
[31]

L. Wang, Z. Wu, K. Sun, Z. Li, and R. Cheng. EvoGP: A GPU-accelerated Frame- work for Tree-based Genetic Programming. arXiv e-prints , pages arXiv–2501, 2025

2025
[32]

Wong, T.-T

M.-L. Wong, T.-T. Wong, and K.-L. Fok. Parallel evolutionary algorithms on graphics processing unit. In 2005 IEEE Congress on Evolutionary Computation , volume 3, pages 2286–2293. IEEE, 2005

2005
[33]

Z. Wu, L. Wang, K. Sun, Z. Li, and R. Cheng. Enabling population-level parallelism in tree-based genetic programming for gpu acceleration. IEEE Transactions on Evolutionary Computation , 2026

2026
[34]

Zhang, A

R. Zhang, A. Lensen, and Y. Sun. Speeding up genetic programming based sym- bolic regression using gpus. In Paciﬁc Rim International Conference on Artiﬁcial Intelligence, pages 519–533. Springer, 2022

2022
[35]

Zhang, Y

R. Zhang, Y. Sun, and M. Zhang. GPU-Based Genetic Programming for Faster Feature Extraction in Binary Image Classiﬁcation. IEEE Transactions on Evolu- tionary Computation , 28(6):1590–1604, 2023

2023