Mixed-Categorical Black-Box Optimization via Information-Geometric Bilevel Decomposition
Pith reviewed 2026-06-27 05:28 UTC · model grok-4.3
The pith
A bilevel information-geometric framework optimizes mixed categorical-continuous black-box problems by explicitly modeling their interactions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors formulate mixed-variable optimization as a bilevel problem under information-geometric optimization, with the outer level searching categorical assignments and the inner level searching continuous parameters conditioned on those assignments. They add a warm-start strategy that caches and reuses promising inner solutions across outer iterations. This decomposition allows the search distribution to capture dependencies that independent models miss.
What carries the argument
Bilevel stochastic relaxation in information-geometric optimization, accelerated by a multi-configuration warm-start cache.
If this is right
- The method maintains performance even when categorical and continuous variables have strong interactions.
- It achieves higher computational efficiency than previous mixed-variable extensions of CMA-ES.
- It works on both previously studied and newly introduced interaction types in benchmarks.
- The framework can be applied to any domain where mixed categorical-continuous decisions arise.
Where Pith is reading between the lines
- The approach might extend naturally to problems with more than two variable types.
- Parallel evaluation of inner loops could further improve speed.
- The cache strategy suggests similar acceleration techniques for other bilevel evolutionary methods.
Load-bearing premise
The assumption that a warm-start cache of multiple configurations will keep the bilevel search cost low enough for practical use.
What would settle it
Running the method on a problem with dozens of categorical variables and comparing runtime and solution quality against a non-bilevel baseline; if the bilevel version is both slower and no better, the claim fails.
Figures
read the original abstract
Mixed categorical-continuous optimization arises in many practical domains, yet remains challenging. In the black-box setting, evolution strategy-based approaches have shown promise in extending the efficiency and robustness of the CMA-ES to mixed-variable spaces. However, these methods exhibit worsened performance when strong categorical-continuous interactions are present, as their underlying search distributions assume independence between categorical and continuous variables. To address this limitation, we propose a bilevel optimization framework that explicitly captures such interactions by optimizing over categorical variables in an outer loop, and over continuous variables conditioned on each categorical configuration in an inner loop. We formulate each level of the bilevel problem as a stochastic relaxation under information-geometric optimization. To mitigate the high computational cost inherent to bilevel optimization, we introduce a warm-starting strategy that accelerates the lower-level search by selecting the best among multiple cached configurations and updating the cache after each iteration. Experimental results on binary-continuous domain demonstrate that the proposed method outperforms existing state-of-the-art approaches in interaction-handling capability while also being more computationally efficient across benchmarks encompassing both previously reported and newly proposed types of interaction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a bilevel information-geometric optimization (IGO) framework for black-box mixed categorical-continuous problems. An outer IGO loop optimizes over categorical configurations while an inner IGO loop optimizes continuous variables conditioned on each configuration; a warm-start cache that selects the best cached configuration and updates after each iteration is introduced to control the cost of repeated inner solves. Experiments on binary-continuous domains are reported to show improved handling of categorical-continuous interactions and better computational efficiency than existing methods across both previously published and newly proposed interaction benchmarks.
Significance. If the empirical claims hold, the bilevel decomposition supplies a principled way to relax the independence assumption that limits standard mixed-variable CMA-ES variants. The explicit separation of categorical and conditional continuous search distributions is a clean extension of existing IGO machinery and could be useful in domains where strong cross-type interactions appear. The warm-start cache is presented as the practical enabler, but its effectiveness is the load-bearing element for any claim of computational advantage.
major comments (2)
- [Abstract / Experimental results paragraph] The central efficiency claim rests on the warm-start cache mitigating the cost of repeated inner IGO solves. The abstract states that the strategy 'accelerates the lower-level search by selecting the best among multiple cached configurations,' yet no scaling curves with categorical cardinality k, no cache-size ablation, and no wall-clock comparison against a naïve bilevel baseline are referenced. Without such evidence the practicality argument for realistic k > 10 remains unverified.
- [Abstract / Experimental results paragraph] The reported outperformance on interaction benchmarks is the main empirical support for the bilevel approach. The abstract asserts superiority 'across benchmarks encompassing both previously reported and newly proposed types of interaction,' but supplies no information on the number of independent runs, statistical testing procedure, or how the new interaction types were constructed. These details are required to assess whether the interaction-handling advantage is robust.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and outline the revisions we will make to improve clarity and support for the claims.
read point-by-point responses
-
Referee: [Abstract / Experimental results paragraph] The central efficiency claim rests on the warm-start cache mitigating the cost of repeated inner IGO solves. The abstract states that the strategy 'accelerates the lower-level search by selecting the best among multiple cached configurations,' yet no scaling curves with categorical cardinality k, no cache-size ablation, and no wall-clock comparison against a naïve bilevel baseline are referenced. Without such evidence the practicality argument for realistic k > 10 remains unverified.
Authors: We agree that the abstract does not reference scaling curves with categorical cardinality k, cache-size ablations, or explicit wall-clock comparisons against a naïve bilevel baseline. The manuscript demonstrates efficiency gains via the warm-start cache on the reported binary-continuous benchmarks, but additional analyses would better substantiate the practicality claim. We will add a cache-size ablation study, wall-clock timing comparisons, and limited scaling results for moderate k in the revised manuscript and update the abstract to summarize these findings. revision: yes
-
Referee: [Abstract / Experimental results paragraph] The reported outperformance on interaction benchmarks is the main empirical support for the bilevel approach. The abstract asserts superiority 'across benchmarks encompassing both previously reported and newly proposed types of interaction,' but supplies no information on the number of independent runs, statistical testing procedure, or how the new interaction types were constructed. These details are required to assess whether the interaction-handling advantage is robust.
Authors: The experimental results section of the manuscript reports averages over 20 independent runs, applies the Wilcoxon rank-sum test with Bonferroni correction for statistical comparisons, and describes the new interaction benchmarks as synthetic functions with controlled categorical-continuous coupling strengths (e.g., via multiplicative interaction terms of varying intensity). To address the concern about accessibility, we will incorporate a concise summary of these experimental details into the abstract in the revision. revision: yes
Circularity Check
Bilevel IGO framework introduces independent decomposition without reduction to inputs
full rationale
The paper formulates a new bilevel structure with outer IGO over categorical configurations and inner IGO over continuous variables conditioned on each configuration, plus a warm-start cache heuristic. This is presented as an extension of existing IGO rather than a derivation that reduces by the paper's own equations to fitted parameters, self-definitions, or self-citation chains. No quoted steps exhibit self-definitional equivalence, fitted inputs renamed as predictions, or load-bearing uniqueness theorems imported from the same authors. The central claims rest on the bilevel separation and experimental validation, which remain independent of the inputs by construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the Genetic and Evolutionary Computation Conference
Akimoto, Y., Gao, X., Ng, Z.K., Morinaga, D.: Challenges of interaction in opti- mizing mixed categorical-continuous variables. In: Proceedings of the Genetic and Evolutionary Computation Conference. pp. 692–700 (2025)
2025
-
[2]
Evolutionary Computation28(3), 405–435 (2020)
Akimoto, Y., Hansen, N.: Diagonal acceleration for covariance matrix adaptation evolution strategies. Evolutionary Computation28(3), 405–435 (2020)
2020
-
[3]
In: International Con- ference on Parallel Problem Solving from Nature
Akimoto, Y., Nagata, Y., Ono, I., Kobayashi, S.: Bidirectional relation between CMA evolution strategies and natural evolution strategies. In: International Con- ference on Parallel Problem Solving from Nature. pp. 154–163. Springer (2010)
2010
-
[4]
In: International Conference on Machine Learning
Akimoto, Y., Shirakawa, S., Yoshinari, N., Uchida, K., Saito, S., Nishida, K.: Adap- tive stochastic natural gradient method for one-shot neural architecture search. In: International Conference on Machine Learning. pp. 171–180. PMLR (2019)
2019
-
[5]
Neural Computation 10(2), 251–276 (1998)
Amari, S.I.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251–276 (1998)
1998
-
[6]
Journal of Optimization Theory and Applica- tions93(2), 273–300 (1997)
Audet, C., Hansen, P., Jaumard, B., Savard, G.: Links between linear bilevel and mixed 0–1 programming problems. Journal of Optimization Theory and Applica- tions93(2), 273–300 (1997)
1997
-
[7]
Structural and Multidisciplinary Optimization62(1), 337–351 (2020)
Barjhoux, P.J., Diouane, Y., Grihon, S., Bettebghor, D., Morlier, J.: A bi-level methodology for solving large-scale mixed categorical structural optimization. Structural and Multidisciplinary Optimization62(1), 337–351 (2020)
2020
-
[8]
Advances in Neural Information Processing Systems24(2011)
Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. Advances in Neural Information Processing Systems24(2011)
2011
-
[9]
IEEE Transactions on Evolutionary Computation29(2), 474–489 (2025)
Chen, L., Cheung, Y.M., Liu, H.L., Lai, Y.: MOTEA-II: A collaborative multiob- jective transformation-based evolutionary algorithm for bilevel optimization. IEEE Transactions on Evolutionary Computation29(2), 474–489 (2025)
2025
-
[10]
IEEE Transactions on Evolu- tionary Computation28(3), 733–747 (2024)
Chen, L., Liu, H.L., Li, K., Tan, K.C.: Evolutionary bilevel optimization via mul- tiobjective transformation-based lower-level search. IEEE Transactions on Evolu- tionary Computation28(3), 733–747 (2024)
2024
-
[11]
In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence
Daxberger, E., Makarova, A., Turchetta, M., Krause, A.: Mixed-variable Bayesian optimization. In: Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. pp. 2633–2639 (2021)
2021
-
[12]
Finite Elements in Analysis and Design37(5), 447–465 (2001)
Deb, K., Gulati, S.: Design of truss-structures for minimum weight using genetic algorithms. Finite Elements in Analysis and Design37(5), 447–465 (2001)
2001
-
[13]
International Journal of Production Economics73(1), 15–27 (2001)
Ghodsypour, S.H., O’Brien, C.: The total cost of logistics in supplier selection, under conditions of multiple sourcing, multiple criteria and capacity constraint. International Journal of Production Economics73(1), 15–27 (2001)
2001
-
[14]
In: Proceedings of the Genetic and Evolutionary Computation Conference
Hamano, R., Saito, S., Nomura, M., Uchida, K., Shirakawa, S.: CatCMA: Stochas- tic optimization for mixed-category problems. In: Proceedings of the Genetic and Evolutionary Computation Conference. pp. 656–664 (2024)
2024
-
[15]
In: Pro- ceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation
Hansen, N., Auger, A., Ros, R., Finck, S., Pošík, P.: Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009. In: Pro- ceedings of the 12th Annual Conference Companion on Genetic and Evolutionary Computation. pp. 1689–1696 (2010)
2009
-
[16]
Evolutionary Computation11(1), 1–18 (2003)
Hansen, N., Müller, S.D., Koumoutsakos, P.: Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolutionary Computation11(1), 1–18 (2003)
2003
-
[17]
Evolutionary Computation9(2), 159–195 (2001) 16 Marc Ong, Shinichi Shirakawa, and Youhei Akimoto
Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation9(2), 159–195 (2001) 16 Marc Ong, Shinichi Shirakawa, and Youhei Akimoto
2001
-
[18]
IEEE Transactions on Evolutionary Computation23(2), 258– 272 (2018)
He, X., Zhou, Y., Chen, Z.: Evolutionary bilevel optimization based on covariance matrix adaptation. IEEE Transactions on Evolutionary Computation23(2), 258– 272 (2018)
2018
-
[19]
IEEE Transactions on Evolutionary Computation 27(6), 1837–1850 (2023)
Huang, P.Q., Zhang, Q., Wang, Y.: Bilevel optimization via collaborations among lower-level optimization tasks. IEEE Transactions on Evolutionary Computation 27(6), 1837–1850 (2023)
2023
-
[20]
In: 2006 IEEE International Conference on Evolution- ary Computation
Jastrebski, G.A., Arnold, D.V.: Improving evolution strategies through active co- variance matrix adaptation. In: 2006 IEEE International Conference on Evolution- ary Computation. pp. 2814–2821. IEEE (2006)
2006
-
[21]
Karp, R.M.: Reducibility among combinatorial problems, pp. 85–103. Springer US, Boston, MA (1972)
1972
-
[22]
Computers & Structures87(5-6), 267–283 (2009)
Kaveh, A., Talatahari, S.: Particle swarm optimizer, ant colony strategy and har- mony search scheme hybridized for optimization of truss structures. Computers & Structures87(5-6), 267–283 (2009)
2009
-
[23]
In: Mixed Integer Nonlinear Programming, pp
Köppe, M.: On the complexity of nonlinear mixed-integer optimization. In: Mixed Integer Nonlinear Programming, pp. 533–557. Springer (2011)
2011
-
[24]
Evolutionary Computation21(1), 29–64 (2013)
Li, R., Emmerich, M.T., Eggermont, J., Bäck, T., Schütz, M., Dijkstra, J., Reiber, J.H.: Mixed integer evolution strategies for parameter optimization. Evolutionary Computation21(1), 29–64 (2013)
2013
-
[25]
In: International Conference on Learning Representations (2019)
Liu, H., Simonyan, K., Yang, Y.: DARTS: Differentiable architecture search. In: International Conference on Learning Representations (2019)
2019
-
[26]
Journal of Machine Learning Research18(18), 1–65 (2017)
Ollivier, Y., Arnold, L., Auger, A., Hansen, N.: Information-geometric optimiza- tion algorithms: A unifying picture via invariance principles. Journal of Machine Learning Research18(18), 1–65 (2017)
2017
-
[27]
Accelerating Black-Box Bilevel Optimization with Rank-Based Upper-Level Value Function Approximation
Ong, M., Akimoto, Y.: Accelerating black-box bilevel optimization with rank-based upper-level value function approximation. arXiv preprint arXiv:2604.02888 (2026)
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[28]
In: International Conference on Machine Learning
Ru, B., Alvi, A., Nguyen, V., Osborne, M.A., Roberts, S.: Bayesian optimisation over multiple continuous and categorical inputs. In: International Conference on Machine Learning. pp. 8276–8285. PMLR (2020)
2020
-
[29]
European Journal of Operational Research233(1), 1–15 (2014)
SteadieSeifi, M., Dellaert, N.P., Nuijten, W., Van Woensel, T., Raoufi, R.: Multi- modal freight transportation planning: A literature review. European Journal of Operational Research233(1), 1–15 (2014)
2014
-
[30]
Wan, X., Nguyen, V., Ha, H., Ru, B., Lu, C., Osborne, M.A.: Think global and act local: Bayesian optimisation over high-dimensional categorical and mixed search spaces.In:InternationalConferenceonMachineLearning.pp.10663–10674.PMLR (2021)
2021
-
[31]
You, F., Grossmann, I.E.: Mixed-integer nonlinear programming models and algo- rithms for large-scale supply chain design with stochastic inventory management. Industrial & Engineering Chemistry Research47(20), 7802–7817 (2008) Mixed Optimization via Information-Geometric Bilevel Decomposition 17 A Statistical Analysis of Algorithm Performance Tables 1 to...
2008
-
[32]
In both thedc =d x = 5andd c =d x = 10cases, for weak interaction regimes (smalla), CatCMA outperforms IGBD
-
[33]
Otherwise, IGBD largely outperforms both CatCMA and ICatCMA
-
[34]
In thed c =d x = 5case, CatCMA is able to solve type-IV interaction using hundreds of restarts, suggesting a random search-like behavior. T able 1.Comparison of the success rate, total number of function evaluations (FEs), and number of restarts of ICatCMA and CatCMA against the proposed method on the benchmark problemf II withd c =d x = 5as a function of...
1915
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.