Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

Andreas Faust; Juergen Becker; Sven Nitzsche

arxiv: 2606.19369 · v1 · pith:BHOJLVOEnew · submitted 2026-06-11 · 💻 cs.LG · cs.AI

Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

Andreas Faust , Sven Nitzsche , Juergen Becker This is my paper

Pith reviewed 2026-06-27 07:02 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords zero-inflated Gaussianestimation-of-distribution algorithmssparse optimizationblack-box optimizationevolutionary algorithmsLunar Landerlatent variable models

0 comments

The pith

Multivariate zero-inflated Gaussian distributions let estimation-of-distribution algorithms jointly optimize sparsity patterns and active parameter values in black-box problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Estimation-of-distribution algorithms work by fitting a probability distribution to the best solutions found so far and sampling the next generation from it. The paper introduces multivariate zero-inflated Gaussian distributions to extend this approach to sparse parameter spaces, where most entries of a good solution are exactly zero. The model uses a latent structure with separate indicator and value components to capture which parameters are active, what their values are, and how they correlate. Latent parameters remain identifiable from the observed samples, supporting practical estimators that recover the correlation structure. On the Lunar Lander task the resulting algorithm reaches better performance with far fewer active parameters than dense Gaussian EDAs or methods that impose sparsity by hand.

Core claim

We close this gap by proposing multivariate zero-inflated Gaussian (ZIG) distributions as EDA sampling laws. A latent Gaussian model with separate indicator and value dimensions represents sparsity patterns, correlations among active parameters, and the interactions between the two, so sparsity patterns and active values are optimized jointly, hierarchy-free. We show that the latent parameters of this model are identifiable from observed samples, unlike in the missing-data settings where related constructions originate, and introduce practical amortized inversion-based estimators for them. The estimators accurately recover latent correlation structures, and on the Lunar Lander benchmark the

What carries the argument

multivariate zero-inflated Gaussian (ZIG) distributions: a latent Gaussian model with separate indicator and value dimensions that jointly represents sparsity patterns, active values, and their correlations.

If this is right

ZIG-EDAs converge faster than dense Gaussian EDAs on the tested benchmark.
They reach higher final returns while activating only a small fraction of the parameters.
The latent parameters remain identifiable from observed samples, enabling the estimators.
The approach eliminates the need for separate support-set and value optimization stages.
The estimators recover the underlying correlation structures from the sampled data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same latent construction could be inserted into other model-based optimizers that currently assume dense distributions.
Identifiability may allow direct use of the fitted ZIG as an interpretable model of solution structure after optimization finishes.
Scaling the amortized estimators to higher-dimensional parameter spaces would test whether the joint modeling advantage persists.
If the identifiability result holds for other zero-inflated families, similar extensions could apply to discrete or mixed search spaces.

Load-bearing premise

Samples drawn during the optimization process contain enough statistical information to recover the latent parameters of the zero-inflated model.

What would settle it

On synthetic data generated from a known ZIG distribution, the amortized estimators would fail to recover the true latent correlation matrix within sampling error.

Figures

Figures reproduced from arXiv: 2606.19369 by Andreas Faust, Juergen Becker, Sven Nitzsche.

**Figure 2.** Figure 2: Effect of correlation between Vi and Mi . Parameters: activation probability p = 0.1, µ = 2, σ = 1, corr(Vi , Mi) = 0.6. The red curve shows what the density of the Gaussian component would be if corr(Vi , Mi) = 0, in which case µ and σ are directly recoverable. The dashed curve shows the observed density under positive correlation: the estimated µˆ and σˆ are shifted relative to the latent values. To reso… view at source ↗

**Figure 3.** Figure 3: Recovered vs. true off-diagonal correlations. The dashed line marks the identity. [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Absolute correlation errors in value–value blocks as a function of [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Environment return (unpenalized) of the best individual found so far, over 100 generations. [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Number of active (nonzero) parameters of the best individual found so far, for the three [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Final unpenalized scores after 100 generations (mean over 10 runs; error bars show one [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 9.** Figure 9: Number of active parameters (out of 90) at fixed generations (mean over 10 runs; error bars show one standard deviation). 6.2.5 Recovered controller Beyond the aggregate comparison, we examine a single controller recovered by the ZIG-EDA in the configuration of Algorithm 2 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Temporal overlay of a successful landing episode under the optimized quadratic controller. [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Progress of the ZIG-EDA optimization over 100 generations in the configuration of [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

read the original abstract

Estimation-of-distribution algorithms (EDAs) are a powerful class of evolutionary methods for black-box optimization, especially when little is known about the structure of the objective. Whereas classical evolutionary algorithms rely on hand-designed mutation and crossover operators, hard to devise for unknown problem structures, and a source of bias, EDAs sidestep operator design entirely: they fit a probability distribution to the best individuals and sample the next generation from it. EDAs are well established on continuous parameter spaces, but they have not previously been generalized to sparse ones, in which most coefficients of a good solution are exactly zero. Existing sparse black-box optimizers therefore reintroduce exactly what EDAs were designed to avoid: hand-crafted sparsity operators, bi-level schemes alternating between support set and active values, zeroing thresholds, and other baked-in assumptions. We close this gap by proposing multivariate zero-inflated Gaussian (ZIG) distributions as EDA sampling laws. A latent Gaussian model with separate indicator and value dimensions represents sparsity patterns, correlations among active parameters, and the interactions between the two, so sparsity patterns and active values are optimized jointly, hierarchy-free. We show that the latent parameters of this model are identifiable from observed samples, unlike in the missing-data settings where related constructions originate, and introduce practical amortized inversion-based estimators for them. The estimators accurately recover latent correlation structures, and on the Lunar Lander benchmark the resulting ZIG-EDA converges faster and reaches higher final returns than a dense Gaussian EDA, a hand-crafted sparse evolutionary algorithm, and an ad-hoc sparse EDA, while finding controllers with only a small fraction of parameters active.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ZIG-EDA models sparsity directly in the distribution for EDAs but the identifiability claim is asserted without visible support.

read the letter

The paper introduces multivariate zero-inflated Gaussians as the sampling law inside estimation-of-distribution algorithms. This lets the model learn both the support of active parameters and their values jointly from the same distribution, without separate zeroing steps or bi-level schemes.

What stands out is the move to treat sparsity as a property of the probability model itself rather than an added operator. The authors state that the latent parameters remain identifiable from observed samples, in contrast to the missing-data settings where zero-inflated constructions usually appear, and they supply amortized estimators to fit the model. The Lunar Lander runs then show faster convergence, higher returns, and sparser controllers than a dense Gaussian EDA, a hand-crafted sparse evolutionary algorithm, and an ad-hoc sparse variant.

The load-bearing piece is the identifiability result. The abstract claims it holds, yet supplies no conditions, proof sketch, or discussion of how the joint distribution over indicators and values avoids the usual degeneracies when a Gaussian component can produce exact zeros or when the mixing probability interacts with the mean. If that step does not go through, the estimators lose consistency and the claimed joint optimization collapses. The experiments cannot be assessed without the derivations and implementation details.

The work is aimed at people who already use EDAs on continuous black-box problems and now face high-dimensional cases where most coefficients should be zero, such as neural controller design. A reader looking for a modeling change rather than another heuristic layer will find a clear direction here.

It deserves peer review. The gap it targets is real, the proposed construction is a direct modeling step, and the experimental direction is relevant even if the current evidence is thin.

Referee Report

1 major / 1 minor

Summary. The paper proposes multivariate zero-inflated Gaussian (ZIG) distributions as sampling distributions for estimation-of-distribution algorithms (EDAs) to handle sparse parameter spaces in black-box optimization. It models sparsity via a latent Gaussian with separate indicator and value dimensions, enabling joint optimization of support patterns and active values without hand-crafted operators or bi-level schemes. The authors assert that the latent parameters (means, covariances, zero-inflation probabilities) are identifiable from observed samples—unlike in missing-data settings—and introduce amortized inversion-based estimators. On the Lunar Lander benchmark, the resulting ZIG-EDA is reported to converge faster, achieve higher returns, and produce controllers with only a small fraction of active parameters, outperforming a dense Gaussian EDA, a hand-crafted sparse evolutionary algorithm, and an ad-hoc sparse EDA.

Significance. If the identifiability claim holds and the estimators prove consistent, the work would extend EDAs to sparse domains while preserving their core advantage of avoiding hand-designed operators. The Lunar Lander results suggest empirical utility for high-dimensional controller optimization. The contribution hinges on the theoretical identifiability result and the practical reliability of the amortized estimators; without those, the joint-optimization advantage and performance claims cannot be substantiated.

major comments (1)

[Abstract] The assertion that latent ZIG parameters are identifiable from observed samples (unlike missing-data constructions) is load-bearing for estimator consistency and the claim of hierarchy-free joint optimization of support and values. No explicit conditions, proof sketch, or derivation addressing potential degeneracies (e.g., when the Gaussian component can produce exact zeros or when mixing weights interact with means) is supplied in the abstract; this must be provided with concrete uniqueness conditions in the methods section.

minor comments (1)

[Abstract] Quantitative details on the Lunar Lander experiments (e.g., exact fraction of active parameters, mean returns with standard errors, number of runs) would strengthen the performance claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for highlighting the centrality of the identifiability result. We will revise the manuscript to supply the requested explicit conditions, proof sketch, and degeneracy analysis in the methods section.

read point-by-point responses

Referee: [Abstract] The assertion that latent ZIG parameters are identifiable from observed samples (unlike missing-data constructions) is load-bearing for estimator consistency and the claim of hierarchy-free joint optimization of support and values. No explicit conditions, proof sketch, or derivation addressing potential degeneracies (e.g., when the Gaussian component can produce exact zeros or when mixing weights interact with means) is supplied in the abstract; this must be provided with concrete uniqueness conditions in the methods section.

Authors: We agree that the abstract assertion would be strengthened by explicit supporting material. The current manuscript states the identifiability result but does not yet include the detailed conditions or proof sketch in the methods. We will add a new subsection to the methods that states the concrete uniqueness conditions under which the latent means, covariances, and zero-inflation probabilities are recoverable from observed samples, provides a proof sketch, and explicitly addresses the noted degeneracies (Gaussian component producing exact zeros; mixing-weight/mean interactions). This addition will also clarify the distinction from missing-data constructions. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain self-contained with no equations or self-citation reductions visible

full rationale

The provided abstract and text contain no equations, fitted parameters renamed as predictions, or load-bearing self-citations. The identifiability claim is asserted as a modeling result distinct from missing-data cases, without reducing to prior author work or input data by construction. Central proposal of ZIG-EDA and amortized estimators stands as independent content; no step matches any enumerated circularity pattern.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central contribution rests on a new latent-variable distribution whose identifiability is asserted and whose parameters are estimated from samples; no numerical free parameters are named in the abstract.

axioms (1)

domain assumption Latent Gaussian model with separate indicator and value dimensions represents sparsity patterns, correlations among active parameters, and interactions between the two
Core modeling premise stated in the abstract as the basis for joint optimization of support and values.

invented entities (1)

multivariate zero-inflated Gaussian (ZIG) distribution no independent evidence
purpose: Sampling law inside EDAs that encodes both sparsity pattern and active values in one joint distribution
New probabilistic object introduced to close the gap described in the abstract.

pith-pipeline@v0.9.1-grok · 5828 in / 1363 out tokens · 40514 ms · 2026-06-27T07:02:22.209279+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 14 canonical work pages · 1 internal anchor

[1]

Sparse and stable Markowitz portfolios

Joshua Brodie, Ingrid Daubechies, Christine De Mol, Domenico Giannone, and Ignace Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences of the United States of America, 106 0 (30): 0 12267--12272, 2009. doi:10.1073/pnas.0904287106

work page doi:10.1073/pnas.0904287106 2009
[2]

Conn, Katya Scheinberg, and Luis N

Andrew R. Conn, Katya Scheinberg, and Luis N. Vicente. Introduction to Derivative-Free Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia, PA, USA, 2009. ISBN 978-0-89871-668-9

2009
[3]

David L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52 0 (4): 0 1289--1306, 2006. doi:10.1109/TIT.2006.871582

work page doi:10.1109/tit.2006.871582 2006
[4]

Reiter, and Daniel R

Joseph Feldman, Jerome P. Reiter, and Daniel R. Kowal. Gaussian copula models for nonignorable missing data using auxiliary marginal quantiles. arXiv preprint arXiv:2406.03463, 2024. doi:10.48550/arXiv.2406.03463. URL https://arxiv.org/abs/2406.03463

work page doi:10.48550/arxiv.2406.03463 2024
[5]

George and Robert E

Edward I. George and Robert E. McCulloch. Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88 0 (423): 0 881--889, 1993. doi:10.1080/01621459.1993.10476353

work page doi:10.1080/01621459.1993.10476353 1993
[6]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming

Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23 0 (4): 0 2341--2368, 2013. doi:10.1137/120880811

work page doi:10.1137/120880811 2013
[7]

DAIS : Automatic channel pruning via differentiable annealing indicator search

Yushuo Guan, Ning Liu, Pengyu Zhao, Zhengping Che, Kaigui Bian, Yanzhi Wang, and Jian Tang. DAIS : Automatic channel pruning via differentiable annealing indicator search. arXiv preprint, 2020

2020
[8]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the ICLR Workshops, 2016. Also available as arXiv:1510.00149

Pith/arXiv arXiv 2016
[9]

Theory of Estimation-of-Distribution Algorithms

Martin S. Krejca and Carsten Witt. Theory of estimation-of-distribution algorithms. In Benjamin Doerr and Frank Neumann, editors, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pages 405--442. Springer, 2020. doi:10.1007/978-3-030-29414-4_9. Also available as arXiv:1806.05392

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-3-030-29414-4_9 2020
[10]

Lozano, editors

Pedro Larrañaga and Jos \'e A. Lozano, editors. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Boston, MA, USA, 2002. ISBN 978-1-4615-1539-5

2002
[11]

and He, Stewart and Mohan, K

Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, and Bram-Ernst Verhoef. Differentiable transportation pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16911--16921, 2023. doi:10.1109/ICCV51070.2023.01555. URL https://arxiv.org/abs/2307.08483

work page doi:10.1109/iccv51070.2023.01555 2023
[12]

T. J. Mitchell and J. J. Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83 0 (404): 0 1023--1032, 1988. doi:10.1080/01621459.1988.10478694

work page doi:10.1080/01621459.1988.10478694 1988
[13]

Goldberg, and Erick Cant \'u -Paz

Martin Pelikan, David E. Goldberg, and Erick Cant \'u -Paz. BOA : The Bayesian optimization algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 525--532, Orlando, Florida, USA, 1999. Morgan Kaufmann

1999
[14]

Sparse additive models

Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71 0 (5): 0 1009--1030, 2009. doi:10.1111/j.1467-9868.2009.00718.x

work page doi:10.1111/j.1467-9868.2009.00718.x 2009
[15]

Donald B. Rubin. Inference and missing data. Biometrika, 63 0 (3): 0 581--592, 1976. doi:10.1093/biomet/63.3.581. URL https://doi.org/10.1093/biomet/63.3.581

work page doi:10.1093/biomet/63.3.581 1976
[16]

Mauricio Sadinle and Jerome P. Reiter. Itemwise conditionally independent nonresponse modelling for incomplete multivariate data. Biometrika, 104 0 (1): 0 207--220, January 2017. ISSN 0006-3444. doi:10.1093/biomet/asw063. URL https://doi.org/10.1093/biomet/asw063

work page doi:10.1093/biomet/asw063 2017
[17]

Modeling with copulas and vines in estimation of distribution algorithms

Marta Soto, Yasser González-Fernández, and Alberto Ochoa. Modeling with copulas and vines in estimation of distribution algorithms. Revista Investigación Operacional, 36 0 (1): 0 1--23, 2015

2015
[18]

James C. Spall. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. John Wiley & Sons, Hoboken, NJ, USA, 2003. ISBN 0-471-33052-3

2003
[19]

Regression shrinkage and selection via the Lasso

Robert Tibshirani. Regression shrinkage and selection via the Lasso . Journal of the Royal Statistical Society: Series B (Methodological), 58 0 (1): 0 267--288, 1996. doi:10.1111/j.2517-6161.1996.tb02080.x

work page doi:10.1111/j.2517-6161.1996.tb02080.x 1996
[20]

Evolution algorithm with adaptive genetic operator and dynamic scoring mechanism for large‐scale sparse many‐objective optimization

Xia Wang, Wei Zhao, Jia‐Ning Tang, Zhong‐Bin Dai, and Ya‐Ning Feng. Evolution algorithm with adaptive genetic operator and dynamic scoring mechanism for large‐scale sparse many‐objective optimization. Scientific Reports, 15 0 (1): 0 9267, 2025. doi:10.1038/s41598-025-91245-z

work page doi:10.1038/s41598-025-91245-z 2025
[21]

Advancing model pruning via bi-level optimization

Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu. Advancing model pruning via bi-level optimization. In Proceedings of the Advances in Neural Information Processing Systems, volume 35, pages 34358--34371. Curran Associates, Inc., 2022. doi:10.48550/arXiv.2210.04092

work page doi:10.48550/arxiv.2210.04092 2022

[1] [1]

Sparse and stable Markowitz portfolios

Joshua Brodie, Ingrid Daubechies, Christine De Mol, Domenico Giannone, and Ignace Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences of the United States of America, 106 0 (30): 0 12267--12272, 2009. doi:10.1073/pnas.0904287106

work page doi:10.1073/pnas.0904287106 2009

[2] [2]

Conn, Katya Scheinberg, and Luis N

Andrew R. Conn, Katya Scheinberg, and Luis N. Vicente. Introduction to Derivative-Free Optimization. MOS-SIAM Series on Optimization. SIAM, Philadelphia, PA, USA, 2009. ISBN 978-0-89871-668-9

2009

[3] [3]

David L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52 0 (4): 0 1289--1306, 2006. doi:10.1109/TIT.2006.871582

work page doi:10.1109/tit.2006.871582 2006

[4] [4]

Reiter, and Daniel R

Joseph Feldman, Jerome P. Reiter, and Daniel R. Kowal. Gaussian copula models for nonignorable missing data using auxiliary marginal quantiles. arXiv preprint arXiv:2406.03463, 2024. doi:10.48550/arXiv.2406.03463. URL https://arxiv.org/abs/2406.03463

work page doi:10.48550/arxiv.2406.03463 2024

[5] [5]

George and Robert E

Edward I. George and Robert E. McCulloch. Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88 0 (423): 0 881--889, 1993. doi:10.1080/01621459.1993.10476353

work page doi:10.1080/01621459.1993.10476353 1993

[6] [6]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming

Saeed Ghadimi and Guanghui Lan. Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23 0 (4): 0 2341--2368, 2013. doi:10.1137/120880811

work page doi:10.1137/120880811 2013

[7] [7]

DAIS : Automatic channel pruning via differentiable annealing indicator search

Yushuo Guan, Ning Liu, Pengyu Zhao, Zhengping Che, Kaigui Bian, Yanzhi Wang, and Jian Tang. DAIS : Automatic channel pruning via differentiable annealing indicator search. arXiv preprint, 2020

2020

[8] [8]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proceedings of the ICLR Workshops, 2016. Also available as arXiv:1510.00149

Pith/arXiv arXiv 2016

[9] [9]

Theory of Estimation-of-Distribution Algorithms

Martin S. Krejca and Carsten Witt. Theory of estimation-of-distribution algorithms. In Benjamin Doerr and Frank Neumann, editors, Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, pages 405--442. Springer, 2020. doi:10.1007/978-3-030-29414-4_9. Also available as arXiv:1806.05392

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-3-030-29414-4_9 2020

[10] [10]

Lozano, editors

Pedro Larrañaga and Jos \'e A. Lozano, editors. Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Boston, MA, USA, 2002. ISBN 978-1-4615-1539-5

2002

[11] [11]

and He, Stewart and Mohan, K

Yunqiang Li, Jan C. van Gemert, Torsten Hoefler, Bert Moons, Evangelos Eleftheriou, and Bram-Ernst Verhoef. Differentiable transportation pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16911--16921, 2023. doi:10.1109/ICCV51070.2023.01555. URL https://arxiv.org/abs/2307.08483

work page doi:10.1109/iccv51070.2023.01555 2023

[12] [12]

T. J. Mitchell and J. J. Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83 0 (404): 0 1023--1032, 1988. doi:10.1080/01621459.1988.10478694

work page doi:10.1080/01621459.1988.10478694 1988

[13] [13]

Goldberg, and Erick Cant \'u -Paz

Martin Pelikan, David E. Goldberg, and Erick Cant \'u -Paz. BOA : The Bayesian optimization algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), pages 525--532, Orlando, Florida, USA, 1999. Morgan Kaufmann

1999

[14] [14]

Sparse additive models

Pradeep Ravikumar, John Lafferty, Han Liu, and Larry Wasserman. Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71 0 (5): 0 1009--1030, 2009. doi:10.1111/j.1467-9868.2009.00718.x

work page doi:10.1111/j.1467-9868.2009.00718.x 2009

[15] [15]

Donald B. Rubin. Inference and missing data. Biometrika, 63 0 (3): 0 581--592, 1976. doi:10.1093/biomet/63.3.581. URL https://doi.org/10.1093/biomet/63.3.581

work page doi:10.1093/biomet/63.3.581 1976

[16] [16]

Mauricio Sadinle and Jerome P. Reiter. Itemwise conditionally independent nonresponse modelling for incomplete multivariate data. Biometrika, 104 0 (1): 0 207--220, January 2017. ISSN 0006-3444. doi:10.1093/biomet/asw063. URL https://doi.org/10.1093/biomet/asw063

work page doi:10.1093/biomet/asw063 2017

[17] [17]

Modeling with copulas and vines in estimation of distribution algorithms

Marta Soto, Yasser González-Fernández, and Alberto Ochoa. Modeling with copulas and vines in estimation of distribution algorithms. Revista Investigación Operacional, 36 0 (1): 0 1--23, 2015

2015

[18] [18]

James C. Spall. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control. John Wiley & Sons, Hoboken, NJ, USA, 2003. ISBN 0-471-33052-3

2003

[19] [19]

Regression shrinkage and selection via the Lasso

Robert Tibshirani. Regression shrinkage and selection via the Lasso . Journal of the Royal Statistical Society: Series B (Methodological), 58 0 (1): 0 267--288, 1996. doi:10.1111/j.2517-6161.1996.tb02080.x

work page doi:10.1111/j.2517-6161.1996.tb02080.x 1996

[20] [20]

Evolution algorithm with adaptive genetic operator and dynamic scoring mechanism for large‐scale sparse many‐objective optimization

Xia Wang, Wei Zhao, Jia‐Ning Tang, Zhong‐Bin Dai, and Ya‐Ning Feng. Evolution algorithm with adaptive genetic operator and dynamic scoring mechanism for large‐scale sparse many‐objective optimization. Scientific Reports, 15 0 (1): 0 9267, 2025. doi:10.1038/s41598-025-91245-z

work page doi:10.1038/s41598-025-91245-z 2025

[21] [21]

Advancing model pruning via bi-level optimization

Yihua Zhang, Yuguang Yao, Parikshit Ram, Pu Zhao, Tianlong Chen, Mingyi Hong, Yanzhi Wang, and Sijia Liu. Advancing model pruning via bi-level optimization. In Proceedings of the Advances in Neural Information Processing Systems, volume 35, pages 34358--34371. Curran Associates, Inc., 2022. doi:10.48550/arXiv.2210.04092

work page doi:10.48550/arxiv.2210.04092 2022