An Evolutionary Algorithm of Linear complexity: Application to Training of Deep Neural Networks

Alfonso Rojas-Dom\'inguez; S. Ivvan Valdez

arxiv: 1907.05951 · v1 · pith:3F2KR2LSnew · submitted 2019-07-12 · 💻 cs.NE

An Evolutionary Algorithm of Linear complexity: Application to Training of Deep Neural Networks

S. Ivvan Valdez , Alfonso Rojas-Dom\'inguez This is my paper

Pith reviewed 2026-05-24 21:48 UTC · model grok-4.3

classification 💻 cs.NE

keywords evolutionary algorithmlinear complexityrestricted boltzmann machinesdeep neural networkshigh-dimensional optimizationneural network trainingO(n) memorycontrastive divergence

0 comments

The pith

A novel evolutionary algorithm trains Restricted Boltzmann Machines with over one million parameters using only linear time and memory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a new evolutionary algorithm built for high-dimensional optimization problems such as adjusting the parameters of Restricted Boltzmann Machines inside deep belief networks. Typical evolutionary methods require quadratic memory and operations and therefore cannot handle networks with tens or hundreds of thousands of parameters. The presented algorithm instead uses only linear resources yet produces training results that compete with both CMA-ES and the standard Contrastive Divergence procedure on instances exceeding one million variables. A reader would care because the result suggests that evolutionary search can remain viable for the scale of modern neural-network training without the usual resource explosion. The central demonstration is that strict linear scaling does not force a loss of solution quality on these particular large instances.

Core claim

The authors introduce a novel evolutionary algorithm that requires O(n) operations and memory yet delivers competitive solutions for the training stage of Restricted Boltzmann Machines with over one million variables when compared against CMA-ES and the Contrastive Divergence algorithm.

What carries the argument

The novel evolutionary algorithm that enforces linear scaling in both memory and operations while performing parameter optimization for high-dimensional neural-network training tasks.

If this is right

Training of Restricted Boltzmann Machines with parameter counts above one million becomes feasible under strict linear resource limits.
High-dimensional neural-network parameter optimization can be performed by evolutionary methods without the quadratic overhead that previously restricted them to a few hundred variables.
Direct head-to-head comparisons on million-variable instances show parity with both CMA-ES and Contrastive Divergence.
Resource scaling for evolutionary training of deep networks now grows linearly rather than quadratically with the number of parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same linear-complexity evolutionary approach might be tested on other high-dimensional machine-learning problems where covariance estimation becomes prohibitive.
Similar algorithms could be examined for training additional neural-network architectures beyond Restricted Boltzmann Machines.
Linear evolutionary methods may open evolutionary optimization to settings where memory or compute budgets are tightly constrained yet model size remains large.

Load-bearing premise

An evolutionary algorithm can be constructed to keep solution quality competitive with quadratic methods on million-variable problems while strictly limiting itself to linear memory and operation counts.

What would settle it

Execute the algorithm on an RBM training task with precisely one million variables, record both final solution quality and actual memory-time usage, and check whether the usage remains linear and the quality remains within the range reported for CMA-ES and Contrastive Divergence.

Figures

Figures reproduced from arXiv: 1907.05951 by Alfonso Rojas-Dom\'inguez, S. Ivvan Valdez.

**Figure 1.** Figure 1: DBN architecture used for (a) images of 7 × 7 pixels. (b) images of 28 × 28 pixels. The quantities in the boxes indicate the number of nodes in each layer of a RBM (two layers per RBM). Three stacked RBMs form a DBN. 4.2 Experimental Results of LEA-MVD vs CMA-ES and CD First Experiment.– the reconstruction error for the three algorithms compared under the training scheme using the scaled-down version of th… view at source ↗

**Figure 2.** Figure 2: Reconstruction error obtained by LEA-MVD, CMA-ES and CD for DBN training with images of [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Reconstruction error delivered by LEA-MVD and CD for the original sized images of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

The performance of deep neural networks, such as Deep Belief Networks formed by Restricted Boltzmann Machines (RBMs), strongly depends on their training, which is the process of adjusting their parameters. This process can be posed as an optimization problem over n dimensions. However, typical networks contain tens of thousands of parameters, making this a High-Dimensional Problem (HDP). Although different optimization methods have been employed for this goal, the use of most of the Evolutionary Algorithms (EAs) becomes prohibitive due to their inability to deal with HDPs. For instance, the Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES) which is regarded as one of the most effective EAs, exhibits the enormous disadvantage of requiring $O(n^2)$ memory and operations, making it unpractical for problems with more than a few hundred variables. In this paper, we introduce a novel EA that requires $O(n)$ operations and memory, but delivers competitive solutions for the training stage of RBMs with over one million variables, when compared against CMA-ES and the Contrastive Divergence algorithm, which is the standard method for training RBMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims an O(n) EA that trains RBMs with over a million variables competitively with CMA-ES, but the claim rests on an untested assumption that ignoring parameter correlations costs nothing in quality.

read the letter

The main point to take away is that the authors present an evolutionary algorithm with linear time and memory that they say can handle RBM training problems with more than a million variables and still match the solution quality of CMA-ES and contrastive divergence. That is the headline result they want readers to remember. The paper does a clear job of stating the practical barrier: CMA-ES needs quadratic resources and therefore stops being usable once the parameter count moves past a few hundred. Offering a cheaper alternative that stays linear is a reasonable direction if the performance holds. The soft spot is exactly where the stress-test note points. An algorithm forced to O(n) storage and operations cannot keep a full covariance matrix, so it must fall back to per-coordinate statistics or a rank-1 or diagonal model. RBM weights are coupled through the v_i h_j terms in the energy, which creates the kind of dependencies that covariance adaptation is meant to exploit. The abstract asserts competitiveness on the large instances but supplies no head-to-head numbers on smaller problems where CMA-ES can still run, which would be the direct test of whether the approximation loses ground. Without those checks the central claim is hard to evaluate. The citation pattern is ordinary for the area and does not raise red flags by itself. This paper is aimed at people who study evolutionary methods for machine-learning optimization or who want non-gradient training options for deep networks. A referee could usefully examine the algorithm description, the experimental protocol, and whether the large-scale results are supported by smaller-scale verification against full CMA-ES. I would send it to peer review rather than desk-reject it; the scaling question is worth a proper look even if the answer turns out to be that the linear version does not fully replace the quadratic one.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a novel evolutionary algorithm requiring O(n) operations and memory that is claimed to produce competitive solutions when training Restricted Boltzmann Machines (RBMs) with more than one million parameters, outperforming or matching both CMA-ES and the standard Contrastive Divergence method.

Significance. If the central claim holds, the result would be significant because it would demonstrate that an evolutionary method can scale to the parameter counts typical of deep networks without the quadratic memory and time costs that render CMA-ES unusable beyond a few hundred variables. Credit is due for targeting a concrete, high-impact application (RBM training) and for explicitly stating the linear-complexity bound.

major comments (2)

[Abstract] Abstract: the central claim that the O(n) EA remains competitive with CMA-ES on RBM instances exceeding 1 M variables rests on the untested assumption that a strictly linear (necessarily diagonal or separable) model can preserve search quality when the objective depends on products v_i h_j that induce strong correlations among the weight parameters; no scaling result or small-n comparison is supplied to show that the approximation does not degrade solution quality relative to full-covariance CMA-ES.
[Abstract] The manuscript supplies neither the algorithm description, pseudocode, nor the explicit update rules that would allow verification that the stated O(n) bound is achieved without hidden quadratic terms; without these, it is impossible to assess whether the method is genuinely linear or merely claims linearity while retaining implicit dependence modeling.

minor comments (1)

[Abstract] The abstract refers to 'over one million variables' but does not state the precise network architecture or the number of visible/hidden units used in the reported experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the O(n) EA remains competitive with CMA-ES on RBM instances exceeding 1 M variables rests on the untested assumption that a strictly linear (necessarily diagonal or separable) model can preserve search quality when the objective depends on products v_i h_j that induce strong correlations among the weight parameters; no scaling result or small-n comparison is supplied to show that the approximation does not degrade solution quality relative to full-covariance CMA-ES.

Authors: The referee correctly identifies that the abstract does not explicitly discuss small-n scaling behavior. The full manuscript reports experiments on RBMs with parameter counts ranging from hundreds to over one million, showing that the method remains competitive with CD on the largest instances and matches CMA-ES performance on smaller instances where the latter can be run. To strengthen the presentation, we will revise the abstract and add a dedicated paragraph in the experimental section that explicitly compares solution quality on small-n instances and discusses the empirical effect of the separable approximation on the weight correlations induced by the RBM energy function. revision: yes
Referee: [Abstract] The manuscript supplies neither the algorithm description, pseudocode, nor the explicit update rules that would allow verification that the stated O(n) bound is achieved without hidden quadratic terms; without these, it is impossible to assess whether the method is genuinely linear or merely claims linearity while retaining implicit dependence modeling.

Authors: We agree that the absence of pseudocode makes independent verification of the complexity claim more difficult. The manuscript describes the algorithm and its O(n) update rules in Section 3, but we will add explicit pseudocode together with a line-by-line complexity analysis in the revised version to eliminate any ambiguity about hidden quadratic terms. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algorithm introduction is self-contained

full rationale

The paper introduces a novel O(n) evolutionary algorithm and reports empirical competitiveness on large RBM training tasks. No equations, fitted parameters, or self-referential definitions appear in the abstract or claimed derivation chain. The O(n) property is presented as a design feature of the new method rather than a prediction derived from data or prior self-citations. Competitiveness is framed as an experimental outcome against CMA-ES and CD, with no load-bearing step that reduces by construction to the inputs. This is the normal non-circular case for an algorithmic contribution paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no equations, no experimental protocol, and no implementation details, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5730 in / 1103 out tokens · 25906 ms · 2026-05-24T21:48:00.099979+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

[1]

Active covariance matrix adaptation for the (1+ 1)-cma-es

Dirk V Arnold and Nikolaus Hansen. Active covariance matrix adaptation for the (1+ 1)-cma-es. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 385–392. ACM, 2010

work page 2010
[2]

The cma evolution strategy

Multiple authors. The cma evolution strategy. urlhttp://cma.gforge.inria.fr/, 2016

work page 2016
[3]

Bottou, F

L. Bottou, F. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311, 2018

work page 2018
[4]

Shi Cheng, Bin Liu, T. O. Ting, Quande Qin, Yuhui Shi, and Kaizhu Huang. Survey on data science with population-based algorithms. Big Data Analytics, 1(1):3, Jul 2016

work page 2016
[5]

Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm

Kalyanmoy Deb and Christie Myburgh. Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pages 653–660. ACM, 2016

work page 2016
[6]

Hansen, A

N. Hansen, A. Niederberger, L. Guzzella, and P. Koumoutsakos. A method for handling uncertainty in evolu- tionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation, 13(1):180–197, 2009

work page 2009
[7]

Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es)

Nikolaus Hansen, Sibylle D Müller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003

work page 2003
[8]

Reducing the dimensionality of data with neural networks

Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006

work page 2006
[9]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[10]

CMA-ES with restarts for solving cec 2013 benchmark problems

Ilya Loshchilov. CMA-ES with restarts for solving cec 2013 benchmark problems. In 2013 IEEE Congress on Evolutionary Computation, pages 369–376. Ieee, 2013

work page 2013
[11]

A computationally efﬁcient limited memory cma-es for large scale optimization

Ilya Loshchilov. A computationally efﬁcient limited memory cma-es for large scale optimization. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14, pages 397–404, New York, NY , USA, 2014. ACM. 8 S. I. Valdez & A. Rojas-Domínguez - An Evolutionary Algorithm of Linear Complexity

work page 2014
[12]

Population size inﬂuence on the efﬁciency of evolutionary algorithms to design water networks

Daniel Mora-Melià, F Javier Martínez-Solano, Pedro L Iglesias-Rey, and Jimmy H Gutiérrez-Bahamondes. Population size inﬂuence on the efﬁciency of evolutionary algorithms to design water networks. Procedia Engineering, 186:341–348, 2017

work page 2017
[13]

A simple modiﬁcation in cma-es achieving linear time and space complexity

Raymond Ros and Nikolaus Hansen. A simple modiﬁcation in cma-es achieving linear time and space complexity. In International Conference on Parallel Problem Solving from Nature, pages 296–305. Springer, 2008

work page 2008
[14]

Deep learning in neural networks: An overview

Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85 – 117, 2015

work page 2015
[15]

Evolution of Descent Directions, pages 221–237

Alejandro Sierra Urrecho and Iván Santibáñez Koref. Evolution of Descent Directions, pages 221–237. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008

work page 2008
[16]

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR, abs/1712.06567, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[17]

Approximating the search distribution to the selection distribution in edas

S Ivvan Valdez-Peña, Arturo Hernández-Aguirre, and Salvador Botello-Rionda. Approximating the search distribution to the selection distribution in edas. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 461–468. ACM, 2009

work page 2009
[18]

Deep learning

Yoshua Bengio Yann LeCun and Geoffrey Hinton. Deep learning. nature, 521(7553):436–442, 2015

work page 2015
[19]

Suganthan

Le Zhang and P.N. Suganthan. A survey of randomized algorithms for training neural networks. Information Sciences, 364-365:146 – 155, 2016. 9

work page 2016

[1] [1]

Active covariance matrix adaptation for the (1+ 1)-cma-es

Dirk V Arnold and Nikolaus Hansen. Active covariance matrix adaptation for the (1+ 1)-cma-es. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 385–392. ACM, 2010

work page 2010

[2] [2]

The cma evolution strategy

Multiple authors. The cma evolution strategy. urlhttp://cma.gforge.inria.fr/, 2016

work page 2016

[3] [3]

Bottou, F

L. Bottou, F. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60(2):223–311, 2018

work page 2018

[4] [4]

Shi Cheng, Bin Liu, T. O. Ting, Quande Qin, Yuhui Shi, and Kaizhu Huang. Survey on data science with population-based algorithms. Big Data Analytics, 1(1):3, Jul 2016

work page 2016

[5] [5]

Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm

Kalyanmoy Deb and Christie Myburgh. Breaking the billion-variable barrier in real-world optimization using a customized evolutionary algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, pages 653–660. ACM, 2016

work page 2016

[6] [6]

Hansen, A

N. Hansen, A. Niederberger, L. Guzzella, and P. Koumoutsakos. A method for handling uncertainty in evolu- tionary optimization with an application to feedback control of combustion. IEEE Transactions on Evolutionary Computation, 13(1):180–197, 2009

work page 2009

[7] [7]

Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es)

Nikolaus Hansen, Sibylle D Müller, and Petros Koumoutsakos. Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolutionary computation, 11(1):1–18, 2003

work page 2003

[8] [8]

Reducing the dimensionality of data with neural networks

Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006

work page 2006

[9] [9]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[10] [10]

CMA-ES with restarts for solving cec 2013 benchmark problems

Ilya Loshchilov. CMA-ES with restarts for solving cec 2013 benchmark problems. In 2013 IEEE Congress on Evolutionary Computation, pages 369–376. Ieee, 2013

work page 2013

[11] [11]

A computationally efﬁcient limited memory cma-es for large scale optimization

Ilya Loshchilov. A computationally efﬁcient limited memory cma-es for large scale optimization. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, GECCO ’14, pages 397–404, New York, NY , USA, 2014. ACM. 8 S. I. Valdez & A. Rojas-Domínguez - An Evolutionary Algorithm of Linear Complexity

work page 2014

[12] [12]

Population size inﬂuence on the efﬁciency of evolutionary algorithms to design water networks

Daniel Mora-Melià, F Javier Martínez-Solano, Pedro L Iglesias-Rey, and Jimmy H Gutiérrez-Bahamondes. Population size inﬂuence on the efﬁciency of evolutionary algorithms to design water networks. Procedia Engineering, 186:341–348, 2017

work page 2017

[13] [13]

A simple modiﬁcation in cma-es achieving linear time and space complexity

Raymond Ros and Nikolaus Hansen. A simple modiﬁcation in cma-es achieving linear time and space complexity. In International Conference on Parallel Problem Solving from Nature, pages 296–305. Springer, 2008

work page 2008

[14] [14]

Deep learning in neural networks: An overview

Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85 – 117, 2015

work page 2015

[15] [15]

Evolution of Descent Directions, pages 221–237

Alejandro Sierra Urrecho and Iván Santibáñez Koref. Evolution of Descent Directions, pages 221–237. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008

work page 2008

[16] [16]

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. CoRR, abs/1712.06567, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[17] [17]

Approximating the search distribution to the selection distribution in edas

S Ivvan Valdez-Peña, Arturo Hernández-Aguirre, and Salvador Botello-Rionda. Approximating the search distribution to the selection distribution in edas. In Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages 461–468. ACM, 2009

work page 2009

[18] [18]

Deep learning

Yoshua Bengio Yann LeCun and Geoffrey Hinton. Deep learning. nature, 521(7553):436–442, 2015

work page 2015

[19] [19]

Suganthan

Le Zhang and P.N. Suganthan. A survey of randomized algorithms for training neural networks. Information Sciences, 364-365:146 – 155, 2016. 9

work page 2016