BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning
Pith reviewed 2026-06-28 06:49 UTC · model grok-4.3
The pith
BBOmix supplies 105000 hyperparameter evaluations of autoencoders on real multi-omics datasets to enable systematic study of unsupervised representation learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BBOmix is the first open tabular benchmark for hyperparameter optimization of unsupervised representation learning on real-world biological data; it records 105000 evaluations across four autoencoder architectures and seven multi-omics modalities from TCGA and SCHC, quantifies the correlation between reconstruction loss and downstream utility, and supplies baseline results for single-fidelity, multi-fidelity, and transfer-learning HPO methods.
What carries the argument
BBOmix, the collection of 105000 tabulated hyperparameter evaluations that serves as a reusable testbed for comparing HPO algorithms on biological autoencoders.
If this is right
- Researchers can now compare new HPO algorithms against published baselines without repeating the full evaluation budget.
- Optimization routines can be chosen according to the measured correlation between reconstruction loss and downstream task performance rather than by assumption.
- Default hyperparameter settings can be replaced by configurations already shown to work across the recorded modalities.
- Future work can extend the benchmark by adding more runs or modalities while reusing the same tabular format.
Where Pith is reading between the lines
- The same tabular format could be applied to other unsupervised models such as variational autoencoders or contrastive learners on biological data.
- If reconstruction loss proves weakly correlated in most cases, the field may shift toward multi-objective or surrogate-based tuning that directly targets downstream metrics.
- The benchmark size makes it feasible to study transfer across modalities, which could reduce the need for modality-specific tuning from scratch.
Load-bearing premise
The four chosen autoencoder architectures and the TCGA plus SCHC modalities together stand in for the wider range of unsupervised biological representation learning problems.
What would settle it
A new HPO method that ranks differently when tested on additional omics datasets or architectures not included in the benchmark would show that the supplied baselines do not generalize.
Figures
read the original abstract
The rapid advancement of high-throughput sequencing has led to large, high-dimensional omics datasets. Deep unsupervised learning architectures, particularly Autoencoders (AEs), are increasingly used for dimensionality reduction and representation learning in this domain. However, AEs are highly sensitive to architectural choices and hyperparameters, and unsupervised optimization typically relies on reconstruction loss, which may be a poor proxy for downstream utility. Exhaustive hyperparameter optimization (HPO) is computationally expensive, leading researchers to frequently rely on suboptimal default configurations. To democratize access to large-scale unsupervised HPO research, we introduce $\textbf{BBOmix}$, the first open-source tabular benchmark for unsupervised representation learning on real-world biological data. Our benchmark includes 105,000 evaluations across four AE architectures and seven multi-omics modalities from the TCGA and SCHC datasets. We quantify the correlation between reconstruction loss and downstream task performance and provide an extensive evaluation of state-of-the-art single-fidelity, multi-fidelity, and transfer learning HPO methods, establishing a rigorous baseline for future research in unsupervised biological representation learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BBOmix as the first open-source tabular benchmark for hyperparameter optimization of unsupervised representation learning on biological data. It consists of 105,000 evaluations across four autoencoder architectures and seven multi-omics modalities drawn from the TCGA and SCHC datasets. The work quantifies correlations between reconstruction loss and downstream task performance while providing baselines for single-fidelity, multi-fidelity, and transfer-learning HPO methods.
Significance. If the chosen datasets and architectures are representative, the scale of the released evaluations and the open-source tabular format would constitute a useful community resource for studying HPO in unsupervised biological representation learning. The explicit examination of reconstruction loss as a proxy for downstream utility directly addresses a practical limitation in the field. The provision of extensive HPO baselines is a concrete strength that can support reproducible follow-on work.
major comments (2)
- [Abstract and Datasets section] Abstract and Datasets section: The central claim that BBOmix constitutes a 'durable community benchmark' is load-bearing on the representativeness of the TCGA/SCHC datasets and the four selected AE architectures. No coverage argument, diversity analysis across data regimes (e.g., single-cell vs. bulk), or justification for these specific choices is supplied, leaving open the possibility that correlation profiles or HPO difficulty differ materially on other biological data.
- [Methods and Results sections] Methods and Results sections: The abstract reports correlations and HPO baselines without reference to data splits, statistical procedures for the correlations, or presence/absence of error bars. These omissions are load-bearing for assessing the reliability of the claimed baselines and the reconstruction-downstream correlation findings.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, proposing revisions where the manuscript can be strengthened without misrepresenting its scope.
read point-by-point responses
-
Referee: [Abstract and Datasets section] Abstract and Datasets section: The central claim that BBOmix constitutes a 'durable community benchmark' is load-bearing on the representativeness of the TCGA/SCHC datasets and the four selected AE architectures. No coverage argument, diversity analysis across data regimes (e.g., single-cell vs. bulk), or justification for these specific choices is supplied, leaving open the possibility that correlation profiles or HPO difficulty differ materially on other biological data.
Authors: We agree that the manuscript would benefit from an explicit justification for the dataset and architecture selections to support the benchmark claim. TCGA and SCHC represent standard, publicly available multi-omics resources in cancer genomics with established preprocessing pipelines, and the four AE architectures cover common unsupervised approaches used in the field. However, no diversity analysis across regimes such as single-cell data is present. We will add a dedicated paragraph to the Datasets section providing rationale based on prevalence in prior literature, computational tractability for 105k evaluations, and a clear limitations statement noting the focus on bulk omics. This does not claim universality but addresses the coverage concern directly. revision: yes
-
Referee: [Methods and Results sections] Methods and Results sections: The abstract reports correlations and HPO baselines without reference to data splits, statistical procedures for the correlations, or presence/absence of error bars. These omissions are load-bearing for assessing the reliability of the claimed baselines and the reconstruction-downstream correlation findings.
Authors: The abstract is intentionally concise and does not include these methodological details. The full Methods section specifies 5-fold cross-validation splits and Pearson correlation with p-values; Results figures and tables include error bars (standard deviation across folds). To improve standalone readability of the abstract, we will add a short clause referencing 'cross-validated evaluation and statistical correlation measures with error bars reported in results.' This revision targets the abstract only, as the underlying procedures are already documented in the paper body. revision: yes
Circularity Check
Empirical benchmark introduction with no derivation chain
full rationale
This is an empirical benchmark paper whose central contribution is the release of 105k tabulated HPO runs on TCGA/SCHC multi-omics data using four AE architectures. No equations, predictions, or first-principles derivations are presented that could reduce to fitted inputs, self-definitions, or self-citation chains. The claim of being 'the first open-source tabular benchmark' is a factual assertion about availability, not a derived result. Representativeness of the chosen datasets and architectures is an external validity question, not a circularity issue inside any derivation. The paper is therefore self-contained against external benchmarks with score 0.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Bergstra, J., Bardenet, R., Bengio, Y., and K \'e gl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems , 24
2011
-
[2]
and Bengio, Y
Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research , 13(2)
2012
-
[3]
Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L., et al. (2023). Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 13(2):e1484
2023
-
[4]
T., Li, X., Grosse, R
Chen, R. T., Li, X., Grosse, R. B., and Duvenaud, D. K. (2018). Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems , 31
2018
-
[5]
and Herrmann, C
Doncevic, D. and Herrmann, C. (2023). Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Bioinformatics , 39(6):btad387
2023
-
[6]
Dong, X. and Yang, Y. (2020). Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326
arXiv 2020
-
[7]
Eggensperger, K., Hutter, F., Hoos, H., and Leyton-Brown, K. (2015). Efficient benchmarking of hyperparameter optimizers via surrogates. In Proceedings of the AAAI conference on artificial intelligence , volume 29
2015
-
[8]
M., Mircea, M., Mueller, N
Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S., and Theis, F. J. (2019). Single-cell rna-seq denoising using a deep count autoencoder. Nature communications , 10(1):390
2019
-
[9]
Ewald, J. (2025). Autoencodix raw data for reproducibility. https://doi.org/10.5281/zenodo.15518831
-
[10]
Falkner, S., Klein, A., and Hutter, F. (2018). Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning , pages 1437--1446. PMLR
2018
-
[11]
Feurer, M., Springenberg, J., and Hutter, F. (2015). Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI conference on artificial intelligence , volume 29
2015
-
[12]
Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research , 20(177):1--81
2019
-
[13]
Franceschi, L., Donini, M., Perrone, V., Klein, A., Archambeau, C., Seeger, M., Pontil, M., and Frasconi, P. (2025). Hyperparameter optimization in machine learning. arXiv:2410.22854 [stat.ML]
arXiv 2025
-
[14]
Garnett, R. (2023). Bayesian Optimization . Cambridge University Press
2023
-
[15]
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017). beta- VAE : Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR'17)
2017
-
[16]
and Greene, C
Hu, Q. and Greene, C. S. (2018). Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell rna transcriptomics. In BIOCOMPUTING 2019: proceedings of the Pacific symposium , pages 362--373. World Scientific
2018
-
[17]
and Talwalkar, A
Jamieson, K. and Talwalkar, A. (2016). Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS'16)
2016
-
[18]
J., Jurenaite, N., Pra s c evi \'c , D., Scherf, N., and Ewald, J
Joas, M. J., Jurenaite, N., Pra s c evi \'c , D., Scherf, N., and Ewald, J. (2025). Autoencodix: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond. Nature Computational Science , pages 1--13
2025
-
[19]
Klein, A. and Hutter, F. (2019). Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970
Pith/arXiv arXiv 2019
-
[20]
and Claassen, M
Kopf, A. and Claassen, M. (2021). Latent representation learning in biology and translational medicine. Patterns , 2(3)
2021
-
[21]
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of machine learning research , 18(185):1--52
2018
-
[22]
Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Ben-Tzur, J., Hardt, M., Recht, B., and Talwalkar, A. (2020). A system for massively parallel hyperparameter tuning. Proceedings of machine learning and systems , 2:230--246
2020
-
[23]
Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch \"o lkopf, B., and Bachem, O. (2019). Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning , pages 4114--4124. PMLR
2019
-
[24]
L., Srivatsan, S
Lotfollahi, M., Klimovskaia Susmelj, A., De Donno, C., Hetzel, L., Ji, Y., Ibarra, I. L., Srivatsan, S. R., Naghipourfar, M., Daza, R. M., Martin, B., et al. (2023a). Predicting cellular responses to complex perturbations in high-throughput screens. Molecular systems biology , 19(6):MSB202211517
-
[25]
V., and Theis, F
Lotfollahi, M., Rybakov, S., Hrovatin, K., Hediyeh-Zadeh, S., Talavera-L \'o pez, C., Misharin, A. V., and Theis, F. J. (2023b). Biologically informed deep learning to query gene programs in single-cell atlases. Nature Cell Biology , 25(2):337--350
-
[26]
Mamoshina, P., Vieira, A., Putin, E., and Zhavoronkov, A. (2016). Applications of deep learning in biomedicine. Molecular pharmaceutics , 13(5):1445--1454
2016
-
[27]
Mardis, E. R. (2008). The impact of next-generation sequencing technology on genetics. Trends in genetics , 24(3):133--141
2008
-
[28]
Milacic, M., Beavers, D., Conley, P., Gong, C., Gillespie, M., Griss, J., Haw, R., Jassal, B., Matthews, L., May, B., et al. (2024). The reactome pathway knowledgebase 2024. Nucleic acids research , 52(D1):D672--D678
2024
-
[29]
Ovcharenko, O., Barkmann, F., Toma, P., Daunhawer, I., Vogt, J., Schelter, S., and Boeva, V. (2025). Scssl-bench: Benchmarking self-supervised learning for single-cell data. arXiv preprint arXiv:2506.10031
arXiv 2025
-
[30]
W., Archambeau, C., and Jenatton, R
Perrone, V., Shen, H., Seeger, M. W., Archambeau, C., and Jenatton, R. (2019). Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. Advances in neural information processing systems , 32
2019
-
[31]
Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence , volume 33, pages 4780--4789
2019
-
[32]
A., Spacek, D
Reuter, J. A., Spacek, D. V., and Snyder, M. P. (2015). High-throughput sequencing technologies. Molecular cell , 58(4):586--597
2015
-
[33]
Salinas, D. and Erickson, N. (2023). Tabrepo: A large scale repository of tabular model evaluations and its automl applications. arXiv preprint arXiv:2311.02971
arXiv 2023
-
[34]
Salinas, D., Golebiowski, J., Klein, A., Seeger, M., and Archambeau, C. (2023). Optimizing hyperparameters with conformal quantile regression. In International Conference on Machine Learning , pages 29876--29893. PMLR
2023
-
[35]
Salinas, D., Seeger, M., Klein, A., Perrone, V., Wistuba, M., and Archambeau, C. (2022). Syne tune: A library for large scale hyperparameter tuning and reproducible research. In International Conference on Automated Machine Learning , pages 16--1. PMLR
2022
-
[36]
Salinas, D., Shen, H., and Perrone, V. (2020). A quantile-based approach for hyperparameter transfer learning. In International conference on machine learning , pages 8438--8448. PMLR
2020
-
[37]
A., Jakhmola, R., Sprang, M., Gro mann, G., Raki, H., Maani, N., Pavliuk, D., Ewald, J., and Vollmer, S
Selby, D. A., Jakhmola, R., Sprang, M., Gro mann, G., Raki, H., Maani, N., Pavliuk, D., Ewald, J., and Vollmer, S. (2025). Visible neural networks for multi-omics integration: a critical review. Frontiers in Artificial Intelligence , 8:1595291
2025
-
[38]
Seninge, L., Anastopoulos, I., Ding, H., and Stuart, J. (2021). Vega is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nature communications , 12(1):5684
2021
-
[39]
Simidjievski, N., Bodnar, C., Tariq, I., Scherer, P., Andres Terre, H., Shams, Z., Jamnik, M., and Li \`o , P. (2019). Variational autoencoders for cancer data integration: design principles and computational practice. Frontiers in genetics , 10:1205
2019
-
[40]
Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical B ayesian optimization of machine learning algorithms. In Proceedings of the 25th International Conference on Advances in Neural Information Processing Systems (NeurIPS'12)
2012
-
[41]
Tiao, L., Klein, A., Seeger, M., Archambeau, C., Bonilla, E., and Ramos, F. (2020). Bayesian optimization by density ratio estimation
2020
-
[42]
N., Collisson, E
Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics , 45(10):1113--1120
2013
-
[43]
S., and Pollard, K
Whalen, S., Schreiber, J., Noble, W. S., and Pollard, K. S. (2022). Navigating the pitfalls of applying machine learning in genomics. Nature Reviews Genetics , 23(3):169--181
2022
-
[44]
Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015a). Learning hyperparameter optimization initializations. In 2015 IEEE international conference on data science and advanced analytics (DSAA) , pages 1--10. IEEE
2015
-
[45]
Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015b). Sequential model-free hyperparameter tuning. In 2015 IEEE international conference on data mining , pages 1033--1038. IEEE
2015
-
[46]
D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., Shivashankar, G., and Uhler, C
Yang, K. D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., Shivashankar, G., and Uhler, C. (2021). Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature communications , 12(1):31
2021
-
[47]
M., Coleman, C., Clarence, T., Latouche, O., Tsankova, N
Zhu, K., Bendl, J., Rahman, S., Vicari, J. M., Coleman, C., Clarence, T., Latouche, O., Tsankova, N. M., Li, A., Brennand, K. J., et al. (2023). Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Science advances , 9(41):eadg3754
2023
-
[48]
Zimmer, L., Lindauer, M., and Hutter, F. (2021). Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl. IEEE transactions on pattern analysis and machine intelligence , 43(9):3079--3090
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.