pith. machine review for the scientific record. sign in

arxiv: 2604.24968 · v1 · submitted 2026-04-27 · 💻 cs.NE

Recognition: unknown

The Effects of Population Size on the Performance of BEAGLE GPU-Based Genetic Programming Runs

Authors on Pith no claims yet

Pith reviewed 2026-05-07 17:07 UTC · model grok-4.3

classification 💻 cs.NE
keywords genetic programmingpopulation sizeGPUsymbolic regressionBEAGLEevolutionary searchstepped populations
0
0 comments X

The pith

GPU genetic programming succeeds with problem-dependent population sizes from 1000 to 10 million.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explores the effects of population size on the performance of BEAGLE, a GPU-based genetic programming system, when applied to symbolic regression problems. It shows that for some problems, very small constant populations like 1000 individuals enable effective narrow and deep searches. For other problems, very large constant populations up to 10 million individuals enable effective broad and shallow searches. Stepped population sizes that start large and reduce to small are also tested to balance these aspects of search. A sympathetic reader would care because this reveals how GPU hardware changes the optimal configuration for evolutionary search methods.

Core claim

The authors establish that constant population sizes in BEAGLE GPU genetic programming for symbolic regression yield benefits from narrow deep searches with as few as 1000 individuals for some problems and from broad shallow searches with as many as 10 million individuals for others, while stepped population sizes starting large and decreasing offer a means to balance breadth and depth of search.

What carries the argument

Varying constant and stepped population sizes within the BEAGLE GPU framework for genetic programming, which enables testing extreme scales of population dynamics previously unattainable on CPUs.

Load-bearing premise

The performance differences across population sizes result primarily from the population size and associated search breadth or depth rather than from confounding factors like GPU implementation details or other hyperparameters.

What would settle it

Re-running the symbolic regression experiments with the same population size variations but on a CPU-based genetic programming system to check whether the benefits of narrow versus broad searches hold without GPU-specific factors.

Figures

Figures reproduced from arXiv: 2604.24968 by Elijah Smith, Ilya Basin, Marzieh Kianinejad, Nathan Haut, Ruchika Gupta, Wolfgang Banzhaf, Zachary Perrico.

Figure 1
Figure 1. Figure 1: Shown here is the distribution of actual population sizes across all generations, runs, and experiments that used a target population size of 1 million. We can see that the actual population size falls within a normal distribution around the target size. 5.2 Constant Population Sizes The results of running constant population sizes ranging from small populations of 100 to large populations of 100 million i… view at source ↗
Figure 2
Figure 2. Figure 2: Table 2 shows how many times one of the benchmark problems was view at source ↗
Figure 2
Figure 2. Figure 2: Shown are the performance results on each equation using the different popu￾lation sizes. The bar height indicates the percentage of solved problems among the 30 repeats. Confidence bands are computed using the Wilson Interval. much benefit from using the GPU since its massive compute power is severely underutilized for such small population. 5.3 Stepped Population Sizes In view at source ↗
Figure 3
Figure 3. Figure 3: Shown here are the number of generations completed using each population size setup for all 30 runs across the 7 benchmark problems. The results show that the scaling is approximately linear, so there is an inverse linear relationship between population size and generations. 100 1K 10K 100K 1M 10M 100M 5×107 1×108 5×108 1×109 Pop Size Models Evaluated Model Evaluations vs Pop Size Scaling view at source ↗
Figure 4
Figure 4. Figure 4: Shown here are the total number of model evaluations over the population size setup across all 30 runs for all 7 benchmark problems. We can see that, regardless of population size, the number of model evaluations is comparable. 1.5 million and 3-4 generations when going from 5 million to 100,000. This shows that population size changes are more of a smooth transition than a hard step in this selection appr… view at source ↗
Figure 5
Figure 5. Figure 5: Population size changes with a step in population size target is triggered after generation 19. Left: Step-down from 5 million to 1.5 million target size; Right: Step￾down from 5 million to 100,000 target size. The results of running the stepped population sizes are shown in view at source ↗
Figure 6
Figure 6. Figure 6: Shown here are the performance results (solved %) of each stepped strategy across all 7 benchmark problems. Acknowledgments. The team acknowledges Noblis, Inc. for supporting the develop￾ment of the Beagle framework. This work was supported in part by Michigan State Uni￾versity through computational resources provided by the Institute for Cyber-Enabled Research view at source ↗
read the original abstract

The Beagle framework, through GPU-based Genetic Programming, enables population dynamics previously unattainable (within practical time frames) by CPU-constrained Genetic Programming systems. This work explores how GPU-enabled population sizes impact the success of training for symbolic regression problems. Specifically, when using constant population sizes, we see benefits of using very narrow and deep searches (as narrow as 1000 individuals) for some problems, while other problems benefit from very broad and shallow searches (as broad as 10 million individuals). We also explore stepped population sizes that start with large populations and drop to small populations to balance the breadth and depth of search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents an empirical study using the BEAGLE GPU-based Genetic Programming framework to examine how population size affects training success on symbolic regression problems. It reports that constant population sizes produce problem-dependent outcomes, with some problems favoring narrow and deep searches (down to 1000 individuals) and others favoring broad and shallow searches (up to 10 million individuals). The work additionally explores stepped population size schedules that begin large and decrease over time to balance search breadth and depth.

Significance. If the performance differences can be shown to arise specifically from population size after isolating total computational budget, the findings would offer practical guidance for configuring large-scale GP runs on GPUs. This could inform problem-specific choices between deep versus wide search and the utility of dynamic population schedules, potentially improving efficiency in evolutionary computation applications.

major comments (2)
  1. [Abstract] Abstract: the reported benefits of narrow/deep (1000) versus broad/shallow (10M) searches cannot be cleanly attributed to population size. If generations are held fixed while population size varies, total fitness evaluations scale linearly with population size; the manuscript gives no indication that generations were adjusted, that results were normalized by total evaluations, or that fixed-budget controls were performed. This confound is load-bearing for the central claim that search shape (breadth vs depth) drives the problem-specific differences.
  2. [Experimental description] Experimental description: the abstract and summary provide no details on the number of independent runs, the specific symbolic regression benchmarks, statistical tests employed, or error bars/variance measures. Without these, it is impossible to assess whether the observed preferences for narrow versus broad searches are statistically reliable or reproducible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped clarify key aspects of our experimental design and presentation. We have revised the manuscript to address the concerns about computational budget controls and methodological details, strengthening the attribution of results to population size effects in GPU-based GP.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported benefits of narrow/deep (1000) versus broad/shallow (10M) searches cannot be cleanly attributed to population size. If generations are held fixed while population size varies, total fitness evaluations scale linearly with population size; the manuscript gives no indication that generations were adjusted, that results were normalized by total evaluations, or that fixed-budget controls were performed. This confound is load-bearing for the central claim that search shape (breadth vs depth) drives the problem-specific differences.

    Authors: We agree that isolating the effect of search shape from total computational budget is essential. Our original experiments fixed the number of generations (detailed in Section 3) to examine how GPU-enabled population sizes affect parallel search breadth versus depth in practical runtimes. To directly address the potential confound, we have added fixed-budget experiments in a new subsection of the results, where generations are scaled inversely with population size to hold total fitness evaluations constant. These controls confirm that problem-specific preferences for narrow/deep versus broad/shallow searches persist, supporting our claims while ruling out simple evaluation-count explanations. revision: yes

  2. Referee: [Experimental description] Experimental description: the abstract and summary provide no details on the number of independent runs, the specific symbolic regression benchmarks, statistical tests employed, or error bars/variance measures. Without these, it is impossible to assess whether the observed preferences for narrow versus broad searches are statistically reliable or reproducible.

    Authors: We appreciate the need for these details to support reproducibility. The full paper's Experimental Setup section specifies 30 independent runs per configuration, the exact symbolic regression benchmarks (Nguyen, Keijzer, and Vladislavleva suites), Wilcoxon rank-sum tests for significance, and results with mean ± standard deviation. To make this immediately accessible, we have updated the abstract and added a concise methods summary table in the introduction. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical reporting of experimental observations

full rationale

The paper conducts and reports GPU-based genetic programming experiments varying population size (constant and stepped) on symbolic regression problems. No mathematical derivations, first-principles predictions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described content. All claims reduce directly to measured run outcomes rather than to any internal definition or prior author result by construction. The skeptic concern about evaluation budget confounding is a potential experimental-design issue but does not constitute circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The study is an empirical investigation that relies on the correctness of the BEAGLE GPU implementation and standard genetic programming mechanisms without introducing new theoretical constructs.

free parameters (1)
  • population_size
    Tested values (1000 to 10 million) and stepped schedules are experimental choices varied to observe effects rather than fitted parameters.
axioms (1)
  • domain assumption The BEAGLE framework correctly implements standard genetic programming operators and GPU-accelerated evaluation
    The paper builds directly on this framework without re-deriving or verifying its core mechanisms.

pith-pipeline@v0.9.0 · 5420 in / 1332 out tokens · 44995 ms · 2026-05-07T17:07:30.973000+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

35 extracted references · 2 canonical work pages

  1. [1]

    Arora, R

    R. Arora, R. Tulshyan, and K. Deb. Parallelization of binary and real-coded genetic algorithms on gpu using cuda. In IEEE Congress on Evolutionary Computation , pages 1–8. IEEE, 2010

  2. [2]

    Baeta, J

    F. Baeta, J. Correia, T. Martins, and P. Machado. Tensorgp–genetic programming engine in tensorflow. In International Conference on the Applications of Evolu- tionary Computation (Part of EvoStar) , pages 763–778. Springer, 2021

  3. [3]

    D. M. Chitty. Fast parallel genetic programming: Multi-core CPU versus many- core GPU. Soft Computing , 16(10):1795–1814, 2012

  4. [4]

    L. Fan, Z. Su, and X. Liu. An asynchronous parallel symbolic regression approach based on multiobjective genetic programming with GPU acceleration. Applied Soft Computing, page 115010, 2026

  5. [5]

    Farinati and L

    D. Farinati and L. Vanneschi. A survey on dynamic populations in bio-inspired algorithms. Genetic Programming and Evolvable Machines , 25(2):19, 2024

  6. [6]

    Ferigo and G

    A. Ferigo and G. Iacca. A gpu-enabled compact genetic algorithm for very large- scale optimization problems. Mathematics, 8(5):758, 2020

  7. [7]

    Fok, T.-T

    K.-L. Fok, T.-T. Wong, and M.-L. Wong. Evolutionary computing on consumer graphics hardware. IEEE Intelligent Systems , 22(2):69–78, 2007

  8. [8]

    Gagné, M

    C. Gagné, M. Parizeau, and M. Dubreuil. Distributed beagle: An environment for parallel and distributed evolutionary computations. In Proc. of the 17th Annual International Symposium on High Performance Computing Systems and Applica- tions (HPCS) , volume 2003, pages 201–208, 2003

  9. [9]

    Harding and W

    S. Harding and W. Banzhaf. Fast Genetic Programming on GPUs. In Euro- pean Conference on Genetic Programming (EuroGP 2007) , pages 90–101. Springer, 2007

  10. [10]

    Harding and W

    S. Harding and W. Banzhaf. Genetic programming on gpus for image processing. International Journal of High Performance Systems Architecture , 1(4):231–240, 2008

  11. [11]

    N. Haut, W. Banzhaf, and B. Punch. Correlation Versus RMSE Loss Functions in Symbolic Regression Tasks, pages 31–55. Springer Nature Singapore, Singapore, 2023

  12. [12]

    N. Haut, I. Basin, M. Kianinejad, R. Gupta, E. Smith, Z. Perrico, and W. Banzhaf. Gpu-accelerated genetic programming for symbolic regression with beagle frame- work. arXiv preprint arXiv:2603.12292 , 2026

  13. [13]

    Hu and W

    T. Hu and W. Banzhaf. The role of population size in rate of evolution in genetic programming. In European Conference on Genetic Programming (EuroGP 2009) , pages 85–96. Springer, 2009

  14. [14]

    T. Hu, S. Harding, and W. Banzhaf. Variable population size and evolution accel- eration: A case study with a parallel evolutionary algorithm. Genetic Programming and Evolvable Machines , 11(2):205–225, 2010

  15. [15]

    Kouchakpour, A

    P. Kouchakpour, A. Zaknich, and T. Bräunl. Dynamic population variation in genetic programming. Information Sciences , 179(8):1078–1091, 2009

  16. [16]

    J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection . MIT Press, Cambridge, MA, USA, 1992

  17. [17]

    W. B. Langdon. Graphics processing units and genetic programming: an overview. Soft computing , 15(8):1657–1669, 2011

  18. [18]

    W. B. Langdon and W. Banzhaf. A simd interpreter for genetic programming on gpu graphics cards. In European Conference on Genetic Programming (EuroGP 2008), pages 73–85. Springer, 2008. 16 N. Haut et al

  19. [19]

    Lange, Y

    R. Lange, Y. Tang, and Y. Tian. Neuroevobench: Benchmarking evolutionary op- timizers for deep learning applications. Advances in Neural Information Processing Systems, 36:32160–32172, 2023

  20. [20]

    Noblis. Beagle. https://github.com/Noblis/beagle-v1.x, 2026

  21. [21]

    Oh and K

    K.-S. Oh and K. Jung. Gpu implementation of neural networks. Pattern Recogni- tion, 37(6):1311–1314, 2004

  22. [22]

    Pandey, M

    M. Pandey, M. Fernandez, F. Gentile, O. Isayev, A. Tropsha, A. C. Stern, and A. Cherkasov. The transformational role of gpu computing and deep learning in drug discovery. Nature Machine Intelligence , 4(3):211–221, 2022

  23. [23]

    Robilliard, V

    D. Robilliard, V. Marion, and C. Fonlupt. High-performance Genetic Programming on GPU. In Proceedings of the 2009 Workshop on Bio-inspired Algorithms for Distributed Systems , pages 85–94, 2009

  24. [24]

    Sathia, V

    V. Sathia, V. Ganesh, and S. R. T. Nanditale. Accelerating genetic programming using gpus. arXiv preprint arXiv:2110.11226 , 2021

  25. [25]

    Staats, E

    K. Staats, E. Pantridge, M. Cavaglia, I. Milovanov, and A. Aniyan. Tensorflow enabled genetic programming. In Proceedings of the genetic and evolutionary com- putation conference companion, pages 1872–1879, 2017

  26. [26]

    Steinkrau, P

    D. Steinkrau, P. Y. Simard, and I. Buck. Using gpus for machine learning al- gorithms. In Proceedings of the Eighth International Conference on Document Analysis and Recognition, pages 1115–1119, 2005

  27. [27]

    Y. Tao, M. Li, and J. Cao. A new dynamic population variation in genetic pro- gramming. Computing and Informatics , 32(1):63–87, 2013

  28. [28]

    M. Tegmark. Welcome to the Feynman Symbolic Regression Database!

  29. [29]

    Tomassini, L

    M. Tomassini, L. Vanneschi, J. Cuendet, and F. Fernández. A new technique for dynamic size populations in genetic programming. In Proceedings of the 2004 Congress on Evolutionary Computation , volume 1, pages 486–493. IEEE, 2004

  30. [30]

    Trujillo, J

    L. Trujillo, J. M. M. Contreras, D. E. Hernandez, M. Castelli, and J. J. Tapia. GSGP-CUDA A CUDA Framework for Geometric Semantic Genetic Program- ming. SoftwareX, 18:101085, 2022

  31. [31]

    L. Wang, Z. Wu, K. Sun, Z. Li, and R. Cheng. EvoGP: A GPU-accelerated Frame- work for Tree-based Genetic Programming. arXiv e-prints , pages arXiv–2501, 2025

  32. [32]

    Wong, T.-T

    M.-L. Wong, T.-T. Wong, and K.-L. Fok. Parallel evolutionary algorithms on graphics processing unit. In 2005 IEEE Congress on Evolutionary Computation , volume 3, pages 2286–2293. IEEE, 2005

  33. [33]

    Z. Wu, L. Wang, K. Sun, Z. Li, and R. Cheng. Enabling population-level parallelism in tree-based genetic programming for gpu acceleration. IEEE Transactions on Evolutionary Computation , 2026

  34. [34]

    Zhang, A

    R. Zhang, A. Lensen, and Y. Sun. Speeding up genetic programming based sym- bolic regression using gpus. In Pacific Rim International Conference on Artificial Intelligence, pages 519–533. Springer, 2022

  35. [35]

    Zhang, Y

    R. Zhang, Y. Sun, and M. Zhang. GPU-Based Genetic Programming for Faster Feature Extraction in Binary Image Classification. IEEE Transactions on Evolu- tionary Computation , 28(6):1590–1604, 2023