An Evolutionary Algorithm of Linear complexity: Application to Training of Deep Neural Networks
Pith reviewed 2026-05-24 21:48 UTC · model grok-4.3
The pith
A novel evolutionary algorithm trains Restricted Boltzmann Machines with over one million parameters using only linear time and memory.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a novel evolutionary algorithm that requires O(n) operations and memory yet delivers competitive solutions for the training stage of Restricted Boltzmann Machines with over one million variables when compared against CMA-ES and the Contrastive Divergence algorithm.
What carries the argument
The novel evolutionary algorithm that enforces linear scaling in both memory and operations while performing parameter optimization for high-dimensional neural-network training tasks.
If this is right
- Training of Restricted Boltzmann Machines with parameter counts above one million becomes feasible under strict linear resource limits.
- High-dimensional neural-network parameter optimization can be performed by evolutionary methods without the quadratic overhead that previously restricted them to a few hundred variables.
- Direct head-to-head comparisons on million-variable instances show parity with both CMA-ES and Contrastive Divergence.
- Resource scaling for evolutionary training of deep networks now grows linearly rather than quadratically with the number of parameters.
Where Pith is reading between the lines
- The same linear-complexity evolutionary approach might be tested on other high-dimensional machine-learning problems where covariance estimation becomes prohibitive.
- Similar algorithms could be examined for training additional neural-network architectures beyond Restricted Boltzmann Machines.
- Linear evolutionary methods may open evolutionary optimization to settings where memory or compute budgets are tightly constrained yet model size remains large.
Load-bearing premise
An evolutionary algorithm can be constructed to keep solution quality competitive with quadratic methods on million-variable problems while strictly limiting itself to linear memory and operation counts.
What would settle it
Execute the algorithm on an RBM training task with precisely one million variables, record both final solution quality and actual memory-time usage, and check whether the usage remains linear and the quality remains within the range reported for CMA-ES and Contrastive Divergence.
Figures
read the original abstract
The performance of deep neural networks, such as Deep Belief Networks formed by Restricted Boltzmann Machines (RBMs), strongly depends on their training, which is the process of adjusting their parameters. This process can be posed as an optimization problem over n dimensions. However, typical networks contain tens of thousands of parameters, making this a High-Dimensional Problem (HDP). Although different optimization methods have been employed for this goal, the use of most of the Evolutionary Algorithms (EAs) becomes prohibitive due to their inability to deal with HDPs. For instance, the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) which is regarded as one of the most effective EAs, exhibits the enormous disadvantage of requiring $O(n^2)$ memory and operations, making it unpractical for problems with more than a few hundred variables. In this paper, we introduce a novel EA that requires $O(n)$ operations and memory, but delivers competitive solutions for the training stage of RBMs with over one million variables, when compared against CMA-ES and the Contrastive Divergence algorithm, which is the standard method for training RBMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a novel evolutionary algorithm requiring O(n) operations and memory that is claimed to produce competitive solutions when training Restricted Boltzmann Machines (RBMs) with more than one million parameters, outperforming or matching both CMA-ES and the standard Contrastive Divergence method.
Significance. If the central claim holds, the result would be significant because it would demonstrate that an evolutionary method can scale to the parameter counts typical of deep networks without the quadratic memory and time costs that render CMA-ES unusable beyond a few hundred variables. Credit is due for targeting a concrete, high-impact application (RBM training) and for explicitly stating the linear-complexity bound.
major comments (2)
- [Abstract] Abstract: the central claim that the O(n) EA remains competitive with CMA-ES on RBM instances exceeding 1 M variables rests on the untested assumption that a strictly linear (necessarily diagonal or separable) model can preserve search quality when the objective depends on products v_i h_j that induce strong correlations among the weight parameters; no scaling result or small-n comparison is supplied to show that the approximation does not degrade solution quality relative to full-covariance CMA-ES.
- [Abstract] The manuscript supplies neither the algorithm description, pseudocode, nor the explicit update rules that would allow verification that the stated O(n) bound is achieved without hidden quadratic terms; without these, it is impossible to assess whether the method is genuinely linear or merely claims linearity while retaining implicit dependence modeling.
minor comments (1)
- [Abstract] The abstract refers to 'over one million variables' but does not state the precise network architecture or the number of visible/hidden units used in the reported experiments.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the O(n) EA remains competitive with CMA-ES on RBM instances exceeding 1 M variables rests on the untested assumption that a strictly linear (necessarily diagonal or separable) model can preserve search quality when the objective depends on products v_i h_j that induce strong correlations among the weight parameters; no scaling result or small-n comparison is supplied to show that the approximation does not degrade solution quality relative to full-covariance CMA-ES.
Authors: The referee correctly identifies that the abstract does not explicitly discuss small-n scaling behavior. The full manuscript reports experiments on RBMs with parameter counts ranging from hundreds to over one million, showing that the method remains competitive with CD on the largest instances and matches CMA-ES performance on smaller instances where the latter can be run. To strengthen the presentation, we will revise the abstract and add a dedicated paragraph in the experimental section that explicitly compares solution quality on small-n instances and discusses the empirical effect of the separable approximation on the weight correlations induced by the RBM energy function. revision: yes
-
Referee: [Abstract] The manuscript supplies neither the algorithm description, pseudocode, nor the explicit update rules that would allow verification that the stated O(n) bound is achieved without hidden quadratic terms; without these, it is impossible to assess whether the method is genuinely linear or merely claims linearity while retaining implicit dependence modeling.
Authors: We agree that the absence of pseudocode makes independent verification of the complexity claim more difficult. The manuscript describes the algorithm and its O(n) update rules in Section 3, but we will add explicit pseudocode together with a line-by-line complexity analysis in the revised version to eliminate any ambiguity about hidden quadratic terms. revision: yes
Circularity Check
No significant circularity; algorithm introduction is self-contained
full rationale
The paper introduces a novel O(n) evolutionary algorithm and reports empirical competitiveness on large RBM training tasks. No equations, fitted parameters, or self-referential definitions appear in the abstract or claimed derivation chain. The O(n) property is presented as a design feature of the new method rather than a prediction derived from data or prior self-citations. Competitiveness is framed as an experimental outcome against CMA-ES and CD, with no load-bearing step that reduces by construction to the inputs. This is the normal non-circular case for an algorithmic contribution paper.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Active covariance matrix adaptation for the (1+ 1)-cma-es
Dirk V Arnold and Nikolaus Hansen. Active covariance matrix adaptation for the (1+ 1)-cma-es. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 385–392. ACM, 2010
work page 2010
-
[2]
Multiple authors. The cma evolution strategy. urlhttp://cma.gforge.inria.fr/, 2016
work page 2016
- [3]
-
[4]
Shi Cheng, Bin Liu, T. O. Ting, Quande Qin, Yuhui Shi, and Kaizhu Huang. Survey on data science with population-based algorithms. Big Data Analytics, 1(1):3, Jul 2016
work page 2016
-
[5]
Kalyanmoy Deb and Christie Myburgh. Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pages 653–660. ACM, 2016
work page 2016
- [6]
-
[7]
Nikolaus Hansen, Sibylle D Müller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003
work page 2003
-
[8]
Reducing the dimensionality of data with neural networks
Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006
work page 2006
-
[9]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[10]
CMA-ES with restarts for solving cec 2013 benchmark problems
Ilya Loshchilov. CMA-ES with restarts for solving cec 2013 benchmark problems. In 2013 IEEE Congress on Evolutionary Computation, pages 369–376. Ieee, 2013
work page 2013
-
[11]
A computationally efficient limited memory cma-es for large scale optimization
Ilya Loshchilov. A computationally efficient limited memory cma-es for large scale optimization. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14, pages 397–404, New York, NY , USA, 2014. ACM. 8 S. I. Valdez & A. Rojas-Domínguez - An Evolutionary Algorithm of Linear Complexity
work page 2014
-
[12]
Population size influence on the efficiency of evolutionary algorithms to design water networks
Daniel Mora-Melià, F Javier Martínez-Solano, Pedro L Iglesias-Rey, and Jimmy H Gutiérrez-Bahamondes. Population size influence on the efficiency of evolutionary algorithms to design water networks. Procedia Engineering, 186:341–348, 2017
work page 2017
-
[13]
A simple modification in cma-es achieving linear time and space complexity
Raymond Ros and Nikolaus Hansen. A simple modification in cma-es achieving linear time and space complexity. In International Conference on Parallel Problem Solving from Nature, pages 296–305. Springer, 2008
work page 2008
-
[14]
Deep learning in neural networks: An overview
Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85 – 117, 2015
work page 2015
-
[15]
Evolution of Descent Directions, pages 221–237
Alejandro Sierra Urrecho and Iván Santibáñez Koref. Evolution of Descent Directions, pages 221–237. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008
work page 2008
-
[16]
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR, abs/1712.06567, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[17]
Approximating the search distribution to the selection distribution in edas
S Ivvan Valdez-Peña, Arturo Hernández-Aguirre, and Salvador Botello-Rionda. Approximating the search distribution to the selection distribution in edas. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 461–468. ACM, 2009
work page 2009
-
[18]
Yoshua Bengio Yann LeCun and Geoffrey Hinton. Deep learning. nature, 521(7553):436–442, 2015
work page 2015
- [19]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.