pith. sign in

arxiv: 2606.02016 · v1 · pith:WXPLI4DWnew · submitted 2026-06-01 · 💻 cs.LG

Evaluating Real-World Generalizability of Algorithm Selection Models

Pith reviewed 2026-06-28 15:47 UTC · model grok-4.3

classification 💻 cs.LG
keywords algorithm selectiongeneralizabilityoptimizationbenchmarksreal-world problemsroboticsUAV path planningcross-domain evaluation
0
0 comments X

The pith

Algorithm selection models show uneven transfer from synthetic benchmarks to real-world robotics and UAV problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Algorithm selection models use problem features to choose which optimizer will work best on a new task. This paper tests those models by training them on standard academic test collections and then applying them to actual engineering tasks in robot trajectory design and unmanned aerial vehicle routing. The evaluation swaps the training and test sets across the four collections to track performance changes. Results map the points at which accuracy holds and the points at which it drops, revealing domain-specific obstacles that affect deployment outside controlled benchmarks.

Core claim

Through a systematic cross-benchmark evaluation, the study analyzes how AS models transfer between domains, identifies where generalization succeeds or breaks down, and highlights the challenges that arise when applying AS in realistic, domain-specific contexts. Our findings provide insights into the robustness of current AS approaches and inform the development of more reliable, broadly applicable AS systems for real-world optimization.

What carries the argument

Cross-benchmark evaluation of AS models trained and tested across synthetic suites (BBOB, CEC) and real-world sets (robotics trajectory optimization, UAV path-planning).

Load-bearing premise

The two chosen real-world problem sets adequately represent the distribution of practical optimization landscapes outside the synthetic benchmarks.

What would settle it

If models trained on synthetic data achieve within-domain accuracy levels when tested on the robotics and UAV sets, that result would contradict the reported breakdown in generalization.

Figures

Figures reproduced from arXiv: 2606.02016 by Eva Tuba, Gjorgjina Cenikj, Jakub Kudela, Tome Eftimov.

Figure 1
Figure 1. Figure 1: Mean algorithm performance per benchmark for [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean algorithm performance per benchmark for [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Mean algorithm performance per benchmark for [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Difference between the performance of the RF [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Difference between the performance of the RF [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Difference between the performance of the RF [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Difference between the performance of the RF model and the dummy model in the setting where benchmarks are [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Benchmark similarity based on normalized cross [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Benchmark similarity based on normalized cross [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗
read the original abstract

Algorithm Selection (AS) aims to automatically identify the most suitable optimization algorithm for a given problem instance by leveraging measurable problem characteristics and historical performance data. In this study, we investigate the generalization ability of AS models across both synthetic and real-world optimization landscapes. We consider two widely used academic benchmark suites (BBOB and CEC) and two real-world problem sets (robotics trajectory optimization tasks and unmanned aerial vehicle path-planning problems). Through a systematic cross-benchmark evaluation, we analyze how AS models transfer between domains, identify where generalization succeeds or breaks down, and highlight the challenges that arise when applying AS in realistic, domain-specific contexts. Our findings provide insights into the robustness of current AS approaches and inform the development of more reliable, broadly applicable AS systems for real-world optimization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that through systematic cross-benchmark evaluation of Algorithm Selection (AS) models on synthetic suites (BBOB and CEC) and two real-world problem sets (robotics trajectory optimization tasks and UAV path-planning problems), it identifies where AS generalization succeeds or breaks down and highlights challenges in realistic, domain-specific contexts.

Significance. If the empirical results on transfer performance are robustly demonstrated with appropriate controls and analysis, the work would provide useful guidance on the limitations of current AS methods when applied outside synthetic benchmarks, informing more reliable real-world AS systems.

major comments (1)
  1. [Abstract] Abstract: The central claim that cross-benchmark evaluation reveals where AS generalization succeeds or breaks down in realistic contexts depends on the two chosen real-world sets being representative of practical optimization landscapes. No characterization or comparison is supplied (e.g., of constraint structures, noise profiles, dimensionality distributions, or other features) to other real-world problems, so any observed transfer patterns could be artifacts of narrow sampling rather than general properties of AS models.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for this comment on the scope of our real-world problem sets. We respond point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that cross-benchmark evaluation reveals where AS generalization succeeds or breaks down in realistic contexts depends on the two chosen real-world sets being representative of practical optimization landscapes. No characterization or comparison is supplied (e.g., of constraint structures, noise profiles, dimensionality distributions, or other features) to other real-world problems, so any observed transfer patterns could be artifacts of narrow sampling rather than general properties of AS models.

    Authors: We agree that the manuscript does not supply an explicit feature-level characterization or comparison of the robotics trajectory optimization and UAV path-planning sets against a wider sample of real-world problems. Our study presents these two domains as concrete, distinct examples of realistic optimization tasks rather than as a statistically representative sample of all practical landscapes. To address the concern, the revised manuscript will add a dedicated subsection that reports dimensionality distributions, constraint types, noise characteristics, and variable bounds for both real-world sets, together with a qualitative discussion of how these features relate to typical real-world optimization challenges reported in the literature. This addition will clarify the intended scope of the observed transfer results without altering the core empirical findings. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical cross-benchmark evaluation with no derivations or fitted predictions

full rationale

The paper is an empirical evaluation study that performs systematic cross-benchmark testing of algorithm selection models on synthetic (BBOB, CEC) and real-world (robotics, UAV) problem sets. No equations, predictions, or self-citations are present that reduce any claimed result to the input data by construction. The central claims rest on observed transfer performance across held-out domains rather than any self-definitional, fitted-input, or uniqueness-imported mechanism. This is a standard non-circular empirical design.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5663 in / 913 out tokens · 15521 ms · 2026-06-28T15:47:13.288821+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 23 canonical work pages · 2 internal anchors

  1. [1]

    Mhd Ali Shehadeh and Jakub Kůdela. 2025. Benchmarking global optimization techniques for unmanned aerial vehicle path planning.Expert Systems with Applications293 (2025), 128645. doi:10.1016/j.eswa.2025.128645

  2. [2]

    Petr Bujok and Patrik Kolenovsky. 2022. Eigen Crossover in Cooperative Model of Evolutionary Algorithms Applied to CEC 2022 Single Objective Numerical Optimisation. In2022 IEEE Congress on Evolutionary Computation (CEC). 1–8. doi:10.1109/CEC55065.2022.9870433

  3. [3]

    Gjorgjina Cenikj, Gašper Petelin, and Tome Eftimov. 2023. TransOpt: Transformer-based Representation Learning for Optimization Problem Clas- sification. In2023 IEEE Symposium Series on Computational Intelligence. http: //arxiv.org/abs/2311.18035

  4. [4]

    Gjorgjina Cenikj, Gašper Petelin, and Tome Eftimov. 2024. A cross-benchmark examination of feature-based algorithm selector generalization in single-objective numerical optimization.Swarm and Evolutionary Computation87 (2024), 101534. doi:10.1016/j.swevo.2024.101534

  5. [5]

    Gjorgjina Cenikj, Gašper Petelin, and Tome Eftimov. 2024. Impact of Scal- ing in ELA Feature Calculation on Algorithm Selection Cross-Benchmark Transferability. In2024 IEEE Congress on Evolutionary Computation (CEC). 1–8. doi:10.1109/CEC60901.2024.10612032

  6. [6]

    Gjorgjina Cenikj, Gašper Petelin, and Tome Eftimov. 2024. TransOptAS: Transformer-Based Algorithm Selection for Single-Objective Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference. Association for Computing Machinery, New York, NY, USA. doi:10.1145/3638530.3654191

  7. [7]

    Gjorgjina Cenikj, Gašper Petelin, Moritz Seiler, Nikola Cenikj, and Tome Eftimov

  8. [8]

    doi:10.1016/j.swevo.2025.101894

    Landscape features in single-objective continuous optimization: Have we hit a wall in algorithm selection generalization?Swarm and Evolutionary Computation94 (2025), 101894. doi:10.1016/j.swevo.2025.101894

  9. [9]

    Carola Doerr, Hao Wang, Furong Ye, Sander van Rijn, and Thomas Bäck

  10. [10]

    arXiv:1810.05281 https://arxiv.org/abs/1810.05281

    IOHprofiler: A Benchmarking and Profiling Tool for Iterative Opti- mization Heuristics.arXiv e-prints:1810.05281(oct 2018). arXiv:1810.05281 https://arxiv.org/abs/1810.05281

  11. [11]

    Hadi, Ali W

    Anas A. Hadi, Ali W. Mohamed, and Kamal M. Jambi. 2021. Single-Objective Real- Parameter Optimization: Enhanced LSHADE-SPACMA Algorithm.Studies in Computational Intelligence906 (2021), 103 – 121. doi:10.1007/978-3-030-58930-1_7 Cited by: 68

  12. [12]

    2009.Real- Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Defini- tions

    Nikolaus Hansen, Steffen Finck, Raymond Ros, and Anne Auger. 2009.Real- Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Defini- tions. Research Report RR-6829. INRIA. https://hal.inria.fr/inria-00362633

  13. [13]

    In: Proceedings of ICNN’95-international Conference on Neural Networks, vol

    J. Kennedy and R. Eberhart. 1995. Particle swarm optimization. InProceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4. 1942–1948 vol.4. doi:10.1109/ICNN.1995.488968

  14. [14]

    Hoos, Frank Neumann, and Heike Trautmann

    Pascal Kerschke, Holger H. Hoos, Frank Neumann, and Heike Trautmann

  15. [15]

    arXiv:https://direct.mit.edu/evco/article- pdf/27/1/3/1552398/evco_a_00242.pdf doi:10.1162/evco_a_00242

    Automated Algorithm Selection: Survey and Perspectives.Evolution- ary Computation27, 1 (03 2019), 3–45. arXiv:https://direct.mit.edu/evco/article- pdf/27/1/3/1552398/evco_a_00242.pdf doi:10.1162/evco_a_00242

  16. [16]

    Pascal Kerschke, Mike Preuss, Simon Wessing, and Heike Trautmann. 2016. Low- Budget Exploratory Landscape Analysis on Multiple Peaks Models. InProceedings of the Genetic and Evolutionary Computation Conference 2016(Denver, Colorado, USA)(GECCO ’16). Association for Computing Machinery, New York, NY, USA, 229–236. doi:10.1145/2908812.2908845

  17. [17]

    Abhishek Kumar, Rakesh Kumar Misra, and Devender Singh. 2017. Improving the local search capability of Effective Butterfly Optimizer using Covariance Matrix Adapted Retreat Phase.2017 IEEE Congress on Evolutionary Computation, CEC 2017 - Proceedings(2017), 1835 – 1842. doi:10.1109/CEC.2017.7969524 Cited by: 266

  18. [18]

    Jakub Kůdela, Martin Juříček, Roman Parák, Alexandros Tzanetos, and Radomil Matoušek. 2024. Benchmarking Derivative-Free Global Optimization Methods on Variable Dimension Robotics Problems. In2024 IEEE Congress on Evolutionary Computation (CEC). 1–8. doi:10.1109/CEC60901.2024.10611780

  19. [19]

    Liang, B

    J.J. Liang, B. Qu, and Ponnuthurai Suganthan. 2013. Problem definitions and evaluation criteria for the CEC 2014 special session and competition on single objective real-parameter numerical optimization.Computational Intelligence Lab- oratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Singapore.(12 2013)

  20. [20]

    Liang, B.Y

    J.J. Liang, B.Y. Qu, P.N. Suganthan, and Q. Chen. 2014. Problem definitions and evaluation criteria for the CEC 2015 competition on learning-based real- parameter single objective optimization.Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Techno- logical University, Singapore.(2014)

  21. [21]

    Liang, B

    J.J. Liang, B. Qu, Ponnuthurai Suganthan, and Alfredo Hernández-Díaz. 2013. Problem Definitions and Evaluation Criteria for the CEC 2013 Special Session on Real-Parameter Optimization.Computational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological Uni- versity, Singapore.(01 2013)

  22. [22]

    Fu Xing Long, Bas van Stein, Moritz Frenzel, Peter Krause, Markus Gitterle, and Thomas Bäck. 2022. Learning the characteristics of engineering optimization problems with applications in automotive crash. InProceedings of the Genetic and Evolutionary Computation Conference. 1227–1236

  23. [23]

    Hadi, Prachi Agrawal, Karam M

    Ali Wagdy Mohamed, Anas A. Hadi, Prachi Agrawal, Karam M. Sallam, and Ali Khater Mohamed. 2021. Gaining-Sharing Knowledge Based Algorithm with Adaptive Parameters Hybrid with IMODE Algorithm for Solving CEC 2021 Bench- mark Problems. In2021 IEEE Congress on Evolutionary Computation (CEC). 841–

  24. [24]

    doi:10.1109/CEC45853.2021.9504814

  25. [25]

    Hadi, Ali Khater Mohamed, and Noor H

    Ali Wagdy Mohamed, Anas A. Hadi, Ali Khater Mohamed, and Noor H. Awad

  26. [26]

    Survey on

    Evaluating the Performance of Adaptive GainingSharing Knowledge Based Algorithm on CEC 2020 Benchmark Problems. In2020 IEEE Congress on Evolu- tionary Computation (CEC). 1–8. doi:10.1109/CEC48606.2020.9185901

  27. [27]

    Ana Nikolikj, Gjorgjina Cenikj, Gordana Ispirova, Diederick Vermetten, Ryan Di- eter Lang, Andries Petrus Engelbrecht, Carola Doerr, Peter Korošec, and Tome Eftimov. 2023. Assessing the Generalizability of a Performance Predictive Model. arXiv:2306.00040 [cs.LG]

  28. [28]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research12 (2011), 2825–2830

  29. [29]

    Gašper Petelin and Gjorgjina Cenikj. 2023. How Far Out of Distribution Can We Go With ELA Features and Still Be Able to Rank Algorithms?. In2023 IEEE Symposium Series on Computational Intelligence (SSCI). 341–346

  30. [30]

    Gašper Petelin, Gjorgjina Cenikj, and Tome Eftimov. 2024. TinyTLA: Topological landscape analysis for optimization problem classification in a limited sample setting.Swarm and Evolutionary Computation84 (2024), 101448. doi:10.1016/j. swevo.2023.101448

  31. [31]

    Raphael Patrick Prager and Heike Trautmann. 2023. Nullifying The Inherent Bias Of Non-Invariant Exploratory Landscape Analysis Features. InApplications of Evolutionary Computation(Brno, Czech Republic). Springer-Verlag, Berlin, Heidelberg, 411–425. doi:10.1007/978-3-031-30229-9_27

  32. [32]

    Raphael Patrick Prager and Heike Trautmann. 2024. Pflacco: Feature-Based Landscape Analysis of Continuous and Constrained Optimization Problems in Python.Evolutionary Computation 32, 3 (09 2024), 211–216. arXiv:https://direct.mit.edu/evco/article- pdf/32/3/211/2463059/evco_a_00341.pdf doi:10.1162/evco_a_00341

  33. [33]

    Paul Rosenbaum. 2005. An Exact Distribution-Free Test Comparing Two Multi- variate Distributions Based on Adjacency.Journal of the Royal Statistical Society Series B67 (09 2005), 515–530. doi:10.1111/j.1467-9868.2005.00513.x

  34. [34]

    Moritz Seiler, Urban Škvorc, Gjorgjina Cenikj, Carola Doerr, and Heike Traut- mann. 2024. Learned Features vs. Classical ELA on Affine BBOB Functions. InInternational Conference on Parallel Problem Solving from Nature. Springer, 137–153. Conference’17, July 2017, Washington, DC, USA Cenikj et al

  35. [35]

    Moritz Vinzent Seiler, Pascal Kerschke, and Heike Trautmann. 2024. Deep- ELA: Deep Exploratory Landscape Analysis with Self-Supervised Pretrained Transformers for Single-and Multi-Objective Continuous Optimization Problems. arXiv preprint arXiv:2401.01192(2024)

  36. [36]

    Ravid Shwartz-Ziv and Amitai Armon. 2022. Tabular data: Deep learning is not all you need.Information Fusion81 (2022), 84–90

  37. [37]

    Rainer Storn and Kenneth Price. 1997. Differential Evolution - A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces.Journal of Global Optimization11 (01 1997), 341–359. doi:10.1023/A:1008202821328

  38. [38]

    Linas Stripinis, Jakub Kudela, and Remigijus Paulavicius. 2024. Benchmarking derivative-free global optimization algorithms under limited dimensions and large evaluation budgets.IEEE Transactions on Evolutionary Computation29, 1 (2024), 187–204

  39. [39]

    Fukunaga

    Ryoji Tanabe and Alex S. Fukunaga. 2014. Improving the search performance of SHADE using linear population size reduction.Proceedings of the 2014 IEEE Congress on Evolutionary Computation, CEC 2014(2014), 1658 – 1665. doi:10.1109/ CEC.2014.6900380 Cited by: 1569

  40. [40]

    Ye Tian, Shichen Peng, Xingyi Zhang, Tobias Rodemann, Kay Chen Tan, and Yaochu Jin. 2020. A Recommender System for Metaheuristic Algorithms for Continuous Optimization Based on Deep Recurrent Neural Networks.IEEE Transactions on Artificial Intelligence1, 1 (2020), 5–18. doi:10.1109/TAI.2020. 3022339

  41. [41]

    Bas van Stein, Fu Xing Long, Moritz Frenzel, Peter Krause, Markus Gitterle, and Thomas Bäck. 2023. DoE2Vec: Deep-Learning Based Features for Ex- ploratory Landscape Analysis. InProceedings of the Companion Conference on Genetic and Evolutionary Computation(Lisbon, Portugal)(GECCO ’23 Com- panion). Association for Computing Machinery, New York, NY, USA, 51...

  42. [42]

    Diederick Vermetten, Furong Ye, Thomas Bäck, and Carola Doerr. 2024. MA- BBOB: A Problem Generator for Black-Box Optimization Using Affine Combi- nations and Shifts.ACM Trans. Evol. Learn. Optim.abs/2312.11083 (2024). To appear. Available at https://doi.org/10.48550/arXiv.2312.11083

  43. [43]

    Guohua Wu, Rammohan Mallipeddi, and Ponnuthurai Suganthan. 2016. Problem Definitions and Evaluation Criteria for the CEC 2017 Competition and Special Session on Constrained Single Objective Real-Parameter Optimization.Com- putational Intelligence Laboratory, Zhengzhou University, Zhengzhou China and Technical Report, Nanyang Technological University, Sing...

  44. [44]

    Urban Škvorc, Tome Eftimov, and Peter Korošec. 2022. Transfer Learning Analysis of Multi-Class Classification for Landscape-Aware Algorithm Selection.Mathe- matics10, 3 (2022). doi:10.3390/math10030432