BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning

Aaron Klein; Jan Ewald; Luca Thale-Bombien; Ralf K\"onig

arxiv: 2606.05139 · v1 · pith:4PYVAZ5Wnew · submitted 2026-06-03 · 💻 cs.LG

BBOmix: A Tabular Benchmark for Hyperparameter Optimization of Unsupervised Biological Representation Learning

Luca Thale-Bombien , Jan Ewald , Ralf K\"onig , Aaron Klein This is my paper

Pith reviewed 2026-06-28 06:49 UTC · model grok-4.3

classification 💻 cs.LG

keywords benchmarkhyperparameter optimizationautoencodersmulti-omics datarepresentation learningunsupervised learningbiological datasetstabular data

0 comments

The pith

BBOmix supplies 105000 hyperparameter evaluations of autoencoders on real multi-omics datasets to enable systematic study of unsupervised representation learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BBOmix to give researchers access to large-scale hyperparameter optimization results without repeating expensive computations on biological data. It covers four autoencoder architectures and seven modalities drawn from the TCGA and SCHC collections, for a total of 105000 recorded runs. The work measures how well reconstruction loss predicts performance on downstream tasks and runs existing single-fidelity, multi-fidelity, and transfer-learning optimizers to create baselines. A sympathetic reader would care because unsupervised models in this domain are known to be sensitive to choices yet are usually tuned with defaults or cheap proxies.

Core claim

BBOmix is the first open tabular benchmark for hyperparameter optimization of unsupervised representation learning on real-world biological data; it records 105000 evaluations across four autoencoder architectures and seven multi-omics modalities from TCGA and SCHC, quantifies the correlation between reconstruction loss and downstream utility, and supplies baseline results for single-fidelity, multi-fidelity, and transfer-learning HPO methods.

What carries the argument

BBOmix, the collection of 105000 tabulated hyperparameter evaluations that serves as a reusable testbed for comparing HPO algorithms on biological autoencoders.

If this is right

Researchers can now compare new HPO algorithms against published baselines without repeating the full evaluation budget.
Optimization routines can be chosen according to the measured correlation between reconstruction loss and downstream task performance rather than by assumption.
Default hyperparameter settings can be replaced by configurations already shown to work across the recorded modalities.
Future work can extend the benchmark by adding more runs or modalities while reusing the same tabular format.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same tabular format could be applied to other unsupervised models such as variational autoencoders or contrastive learners on biological data.
If reconstruction loss proves weakly correlated in most cases, the field may shift toward multi-objective or surrogate-based tuning that directly targets downstream metrics.
The benchmark size makes it feasible to study transfer across modalities, which could reduce the need for modality-specific tuning from scratch.

Load-bearing premise

The four chosen autoencoder architectures and the TCGA plus SCHC modalities together stand in for the wider range of unsupervised biological representation learning problems.

What would settle it

A new HPO method that ranks differently when tested on additional omics datasets or architectures not included in the benchmark would show that the supplied baselines do not generalize.

Figures

Figures reproduced from arXiv: 2606.05139 by Aaron Klein, Jan Ewald, Luca Thale-Bombien, Ralf K\"onig.

**Figure 2.** Figure 2: Hyperparameter importance and proxy metric correlation. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Normalized regret (a-c) and average rank (d-f) of single-fidelity, multi-fidelity and transferlearning optimization algorithms, averaged over all architectures and tasks. hyperparameters suggested by Syne Tune (Salinas et al., 2022), with the exception of REA, where the population size was reduced to avoid sampling too many random configurations. Each method had a simulated wall-clock time budget of 72000… view at source ↗

**Figure 4.** Figure 4: HP importance Vanillix. Dropout 𝑝 dominates downstream performance across both datasets, followed by learning rate. Only for Reconstruction loss, the Input Dimensionality 𝐷 is ranked second. Dropout p Learning rate (VAE) Input dim D nlayers enc factor r Batch size Weight decay 0.0 0.1 0.2 0.3 0.4 Permutation importance (mean decrease in R²) Varix Avg. downstream (all datasets) HP importance Varix (a) Downs… view at source ↗

**Figure 5.** Figure 5: HP importance Varix. Dropout 𝑝 is dominant for downstream performance, followed by the learning rate and the architecture specific parameter 𝛽. For reconstruction loss, input dim 𝐷 rises to second place, and the VAE-specific 𝛽 term contributes only marginally. Learning rate (VAE) Dropout p Input dim D enc factor r nlayers Weight decay Batch size 0.0 0.2 0.4 0.6 Permutation importance (mean decrease in R²) … view at source ↗

**Figure 6.** Figure 6: HP importance Ontix. Learning rate is exceptionally dominant, accounting for the vast majority of explained variance in downstream performance on both SCHC and TCGA. 𝛽 (VAE) is the second most important HP, while dropout 𝑝 ranks third. For reconstruction loss, learning rate and input dim 𝐷 are nearly equally important. Dropout p Learning rate tc Input dim D dimKL mi nlayers enc factor r Weight decay Batch … view at source ↗

**Figure 7.** Figure 7: HP importance Disentanglix. Dropout 𝑝 leads for downstream performance, followed by learning rate and 𝛽𝑡𝑐 , indicating that the total-correlation penalty meaningfully affects generalisation. For reconstruction loss, dropout 𝑝 and input dim 𝐷 dominate, while 𝛽𝑡𝑐 is negligible. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Per-task HP importance Vanillix. Dropout rate dominates consistently regardless of task, with 𝛽 (VAE) as a stable second contributor. Dropout p Learning rate (VAE) Input dim D nlayers enc factor r Weight decay Batch size 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Permutation importance (mean decrease in R²) SCHC Cell type (AUC) HP importance per objective Varix Dropout p Learning rate (VAE) Input dim D nlayers enc factor… view at source ↗

**Figure 9.** Figure 9: Per-task HP importance Varix. Dropout rate dominates consistently regardless of task, with 𝛽 (VAE) as a stable second contributor. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Per-task HP importance Ontix. Learning rate dominates consistently regardless of task, with 𝛽 (VAE) as a stable second contributor. Dropout p Learning rate tc Input dim D dimKL nlayers mi enc factor r Weight decay Batch size 0.0 0.1 0.2 0.3 0.4 0.5 Permutation importance (mean decrease in R²) SCHC Cell type (AUC) HP importance per objective Disentanglix Dropout p Learning rate tc Input dim D dimKL nlayers… view at source ↗

**Figure 11.** Figure 11: Per-task HP importance Disentanglix. Dropout 𝑝 and 𝛽𝑡𝑐 both contribute across tasks, with a more distributed importance profile than Ontix. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Per-task performance averaged over all architectures on the SCHC dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Per-task performance averaged over all architectures on the TCGA dataset. [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Results for Disentanglix tasks (Part 1). [PITH_FULL_IMAGE:figures/full_fig_p022_14.png] view at source ↗

**Figure 15.** Figure 15: Results for Disentanglix tasks (Part 2). [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗

**Figure 16.** Figure 16: Results for Disentanglix (autoencodix-disentanglix_tcga-tcga-RNA-DNA-METH-CLIN). [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: Results for Ontix tasks (Part 1). 25 [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗

**Figure 18.** Figure 18: Results for Ontix tasks (Part 2). 26 [PITH_FULL_IMAGE:figures/full_fig_p026_18.png] view at source ↗

**Figure 19.** Figure 19: Results for Ontix tasks (Part 3). 27 [PITH_FULL_IMAGE:figures/full_fig_p027_19.png] view at source ↗

**Figure 20.** Figure 20: Results for Ontix tasks (Part 4). 28 [PITH_FULL_IMAGE:figures/full_fig_p028_20.png] view at source ↗

**Figure 21.** Figure 21: Results for Ontix tasks (Part 5). 29 [PITH_FULL_IMAGE:figures/full_fig_p029_21.png] view at source ↗

**Figure 22.** Figure 22: Results for Vanillix tasks (Part 1). 30 [PITH_FULL_IMAGE:figures/full_fig_p030_22.png] view at source ↗

**Figure 23.** Figure 23: Results for Vanillix tasks (Part 2). 31 [PITH_FULL_IMAGE:figures/full_fig_p031_23.png] view at source ↗

**Figure 24.** Figure 24: Results for Vanillix (autoencodix-vanillix_tcga-tcga-RNA-DNA-METH-CLIN). [PITH_FULL_IMAGE:figures/full_fig_p032_24.png] view at source ↗

**Figure 25.** Figure 25: Results for Varix tasks (Part 1). 33 [PITH_FULL_IMAGE:figures/full_fig_p033_25.png] view at source ↗

**Figure 26.** Figure 26: Results for Varix tasks (Part 2). 34 [PITH_FULL_IMAGE:figures/full_fig_p034_26.png] view at source ↗

**Figure 27.** Figure 27: Results for Varix (autoencodix-varix_tcga-tcga-RNA-DNA-METH-CLIN). [PITH_FULL_IMAGE:figures/full_fig_p035_27.png] view at source ↗

**Figure 28.** Figure 28: Disentanglix: optimization trajectories maximizing downstream performance [PITH_FULL_IMAGE:figures/full_fig_p037_28.png] view at source ↗

**Figure 29.** Figure 29: Varix: optimization trajectories maximizing downstream performance [PITH_FULL_IMAGE:figures/full_fig_p037_29.png] view at source ↗

**Figure 30.** Figure 30: Vanillix: optimization trajectories maximizing downstream performance [PITH_FULL_IMAGE:figures/full_fig_p038_30.png] view at source ↗

**Figure 31.** Figure 31: Ontix: optimization trajectories maximizing downstream performance [PITH_FULL_IMAGE:figures/full_fig_p039_31.png] view at source ↗

read the original abstract

The rapid advancement of high-throughput sequencing has led to large, high-dimensional omics datasets. Deep unsupervised learning architectures, particularly Autoencoders (AEs), are increasingly used for dimensionality reduction and representation learning in this domain. However, AEs are highly sensitive to architectural choices and hyperparameters, and unsupervised optimization typically relies on reconstruction loss, which may be a poor proxy for downstream utility. Exhaustive hyperparameter optimization (HPO) is computationally expensive, leading researchers to frequently rely on suboptimal default configurations. To democratize access to large-scale unsupervised HPO research, we introduce $\textbf{BBOmix}$, the first open-source tabular benchmark for unsupervised representation learning on real-world biological data. Our benchmark includes 105,000 evaluations across four AE architectures and seven multi-omics modalities from the TCGA and SCHC datasets. We quantify the correlation between reconstruction loss and downstream task performance and provide an extensive evaluation of state-of-the-art single-fidelity, multi-fidelity, and transfer learning HPO methods, establishing a rigorous baseline for future research in unsupervised biological representation learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BBOmix ships a 105k-run benchmark on TCGA/SCHC multi-omics AEs that is new and usable, but its claim to be a durable community standard rests on untested representativeness.

read the letter

The paper's core contribution is BBOmix, an open tabular benchmark with 105,000 hyperparameter evaluations across four autoencoder architectures and seven multi-omics modalities drawn from TCGA and SCHC. It reports correlations between reconstruction loss and downstream task performance and supplies baseline results for single-fidelity, multi-fidelity, and transfer-learning HPO methods.

The scale is the real value. Exhaustive HPO on these models is expensive, so releasing the full set of runs and the correlation numbers gives the subfield a concrete starting point instead of ad-hoc defaults. That part is straightforward and directly addresses the problem stated in the abstract.

The soft spot is scope. The benchmark is positioned as the first general resource for unsupervised biological representation learning, yet it uses only bulk cancer cohorts and four AE variants. No explicit argument shows why these choices sample the broader space (single-cell data, non-cancer cohorts, VAEs, or contrastive models). If the correlation patterns or HPO difficulty profiles shift on other regimes, the benchmark becomes a narrow testbed rather than a stable proxy. The abstract does not contain a diversity analysis that would close this gap.

This is for computational biologists who run unsupervised models on omics data and want a shared HPO testbed. It is worth sending to peer review because the released evaluations are a tangible resource that others can use or extend, even if the representativeness question needs to be aired in review.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces BBOmix as the first open-source tabular benchmark for hyperparameter optimization of unsupervised representation learning on biological data. It consists of 105,000 evaluations across four autoencoder architectures and seven multi-omics modalities drawn from the TCGA and SCHC datasets. The work quantifies correlations between reconstruction loss and downstream task performance while providing baselines for single-fidelity, multi-fidelity, and transfer-learning HPO methods.

Significance. If the chosen datasets and architectures are representative, the scale of the released evaluations and the open-source tabular format would constitute a useful community resource for studying HPO in unsupervised biological representation learning. The explicit examination of reconstruction loss as a proxy for downstream utility directly addresses a practical limitation in the field. The provision of extensive HPO baselines is a concrete strength that can support reproducible follow-on work.

major comments (2)

[Abstract and Datasets section] Abstract and Datasets section: The central claim that BBOmix constitutes a 'durable community benchmark' is load-bearing on the representativeness of the TCGA/SCHC datasets and the four selected AE architectures. No coverage argument, diversity analysis across data regimes (e.g., single-cell vs. bulk), or justification for these specific choices is supplied, leaving open the possibility that correlation profiles or HPO difficulty differ materially on other biological data.
[Methods and Results sections] Methods and Results sections: The abstract reports correlations and HPO baselines without reference to data splits, statistical procedures for the correlations, or presence/absence of error bars. These omissions are load-bearing for assessing the reliability of the claimed baselines and the reconstruction-downstream correlation findings.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, proposing revisions where the manuscript can be strengthened without misrepresenting its scope.

read point-by-point responses

Referee: [Abstract and Datasets section] Abstract and Datasets section: The central claim that BBOmix constitutes a 'durable community benchmark' is load-bearing on the representativeness of the TCGA/SCHC datasets and the four selected AE architectures. No coverage argument, diversity analysis across data regimes (e.g., single-cell vs. bulk), or justification for these specific choices is supplied, leaving open the possibility that correlation profiles or HPO difficulty differ materially on other biological data.

Authors: We agree that the manuscript would benefit from an explicit justification for the dataset and architecture selections to support the benchmark claim. TCGA and SCHC represent standard, publicly available multi-omics resources in cancer genomics with established preprocessing pipelines, and the four AE architectures cover common unsupervised approaches used in the field. However, no diversity analysis across regimes such as single-cell data is present. We will add a dedicated paragraph to the Datasets section providing rationale based on prevalence in prior literature, computational tractability for 105k evaluations, and a clear limitations statement noting the focus on bulk omics. This does not claim universality but addresses the coverage concern directly. revision: yes
Referee: [Methods and Results sections] Methods and Results sections: The abstract reports correlations and HPO baselines without reference to data splits, statistical procedures for the correlations, or presence/absence of error bars. These omissions are load-bearing for assessing the reliability of the claimed baselines and the reconstruction-downstream correlation findings.

Authors: The abstract is intentionally concise and does not include these methodological details. The full Methods section specifies 5-fold cross-validation splits and Pearson correlation with p-values; Results figures and tables include error bars (standard deviation across folds). To improve standalone readability of the abstract, we will add a short clause referencing 'cross-validated evaluation and statistical correlation measures with error bars reported in results.' This revision targets the abstract only, as the underlying procedures are already documented in the paper body. revision: yes

Circularity Check

0 steps flagged

Empirical benchmark introduction with no derivation chain

full rationale

This is an empirical benchmark paper whose central contribution is the release of 105k tabulated HPO runs on TCGA/SCHC multi-omics data using four AE architectures. No equations, predictions, or first-principles derivations are presented that could reduce to fitted inputs, self-definitions, or self-citation chains. The claim of being 'the first open-source tabular benchmark' is a factual assertion about availability, not a derived result. Representativeness of the chosen datasets and architectures is an external validity question, not a circularity issue inside any derivation. The paper is therefore self-contained against external benchmarks with score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmark paper; the central claim introduces no mathematical model, free parameters, axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5728 in / 1148 out tokens · 34667 ms · 2026-06-28T06:49:28.997829+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 1 canonical work pages

[1]

Bergstra, J., Bardenet, R., Bengio, Y., and K \'e gl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems , 24

2011
[2]

and Bengio, Y

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research , 13(2)

2012
[3]

Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L., et al. (2023). Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 13(2):e1484

2023
[4]

T., Li, X., Grosse, R

Chen, R. T., Li, X., Grosse, R. B., and Duvenaud, D. K. (2018). Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems , 31

2018
[5]

and Herrmann, C

Doncevic, D. and Herrmann, C. (2023). Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Bioinformatics , 39(6):btad387

2023
[6]

and Yang, Y

Dong, X. and Yang, Y. (2020). Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326

arXiv 2020
[7]

Eggensperger, K., Hutter, F., Hoos, H., and Leyton-Brown, K. (2015). Efficient benchmarking of hyperparameter optimizers via surrogates. In Proceedings of the AAAI conference on artificial intelligence , volume 29

2015
[8]

M., Mircea, M., Mueller, N

Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S., and Theis, F. J. (2019). Single-cell rna-seq denoising using a deep count autoencoder. Nature communications , 10(1):390

2019
[9]

Ewald, J. (2025). Autoencodix raw data for reproducibility. https://doi.org/10.5281/zenodo.15518831

work page doi:10.5281/zenodo.15518831 2025
[10]

Falkner, S., Klein, A., and Hutter, F. (2018). Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning , pages 1437--1446. PMLR

2018
[11]

Feurer, M., Springenberg, J., and Hutter, F. (2015). Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI conference on artificial intelligence , volume 29

2015
[12]

Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research , 20(177):1--81

2019
[13]

Franceschi, L., Donini, M., Perrone, V., Klein, A., Archambeau, C., Seeger, M., Pontil, M., and Frasconi, P. (2025). Hyperparameter optimization in machine learning. arXiv:2410.22854 [stat.ML]

arXiv 2025
[14]

Garnett, R. (2023). Bayesian Optimization . Cambridge University Press

2023
[15]

Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017). beta- VAE : Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR'17)

2017
[16]

and Greene, C

Hu, Q. and Greene, C. S. (2018). Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell rna transcriptomics. In BIOCOMPUTING 2019: proceedings of the Pacific symposium , pages 362--373. World Scientific

2018
[17]

and Talwalkar, A

Jamieson, K. and Talwalkar, A. (2016). Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS'16)

2016
[18]

J., Jurenaite, N., Pra s c evi \'c , D., Scherf, N., and Ewald, J

Joas, M. J., Jurenaite, N., Pra s c evi \'c , D., Scherf, N., and Ewald, J. (2025). Autoencodix: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond. Nature Computational Science , pages 1--13

2025
[19]

and Hutter, F

Klein, A. and Hutter, F. (2019). Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970

Pith/arXiv arXiv 2019
[20]

and Claassen, M

Kopf, A. and Claassen, M. (2021). Latent representation learning in biology and translational medicine. Patterns , 2(3)

2021
[21]

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of machine learning research , 18(185):1--52

2018
[22]

Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Ben-Tzur, J., Hardt, M., Recht, B., and Talwalkar, A. (2020). A system for massively parallel hyperparameter tuning. Proceedings of machine learning and systems , 2:230--246

2020
[23]

Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch \"o lkopf, B., and Bachem, O. (2019). Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning , pages 4114--4124. PMLR

2019
[24]

L., Srivatsan, S

Lotfollahi, M., Klimovskaia Susmelj, A., De Donno, C., Hetzel, L., Ji, Y., Ibarra, I. L., Srivatsan, S. R., Naghipourfar, M., Daza, R. M., Martin, B., et al. (2023a). Predicting cellular responses to complex perturbations in high-throughput screens. Molecular systems biology , 19(6):MSB202211517
[25]

V., and Theis, F

Lotfollahi, M., Rybakov, S., Hrovatin, K., Hediyeh-Zadeh, S., Talavera-L \'o pez, C., Misharin, A. V., and Theis, F. J. (2023b). Biologically informed deep learning to query gene programs in single-cell atlases. Nature Cell Biology , 25(2):337--350
[26]

Mamoshina, P., Vieira, A., Putin, E., and Zhavoronkov, A. (2016). Applications of deep learning in biomedicine. Molecular pharmaceutics , 13(5):1445--1454

2016
[27]

Mardis, E. R. (2008). The impact of next-generation sequencing technology on genetics. Trends in genetics , 24(3):133--141

2008
[28]

Milacic, M., Beavers, D., Conley, P., Gong, C., Gillespie, M., Griss, J., Haw, R., Jassal, B., Matthews, L., May, B., et al. (2024). The reactome pathway knowledgebase 2024. Nucleic acids research , 52(D1):D672--D678

2024
[29]

Ovcharenko, O., Barkmann, F., Toma, P., Daunhawer, I., Vogt, J., Schelter, S., and Boeva, V. (2025). Scssl-bench: Benchmarking self-supervised learning for single-cell data. arXiv preprint arXiv:2506.10031

arXiv 2025
[30]

W., Archambeau, C., and Jenatton, R

Perrone, V., Shen, H., Seeger, M. W., Archambeau, C., and Jenatton, R. (2019). Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. Advances in neural information processing systems , 32

2019
[31]

Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence , volume 33, pages 4780--4789

2019
[32]

A., Spacek, D

Reuter, J. A., Spacek, D. V., and Snyder, M. P. (2015). High-throughput sequencing technologies. Molecular cell , 58(4):586--597

2015
[33]

and Erickson, N

Salinas, D. and Erickson, N. (2023). Tabrepo: A large scale repository of tabular model evaluations and its automl applications. arXiv preprint arXiv:2311.02971

arXiv 2023
[34]

Salinas, D., Golebiowski, J., Klein, A., Seeger, M., and Archambeau, C. (2023). Optimizing hyperparameters with conformal quantile regression. In International Conference on Machine Learning , pages 29876--29893. PMLR

2023
[35]

Salinas, D., Seeger, M., Klein, A., Perrone, V., Wistuba, M., and Archambeau, C. (2022). Syne tune: A library for large scale hyperparameter tuning and reproducible research. In International Conference on Automated Machine Learning , pages 16--1. PMLR

2022
[36]

Salinas, D., Shen, H., and Perrone, V. (2020). A quantile-based approach for hyperparameter transfer learning. In International conference on machine learning , pages 8438--8448. PMLR

2020
[37]

A., Jakhmola, R., Sprang, M., Gro mann, G., Raki, H., Maani, N., Pavliuk, D., Ewald, J., and Vollmer, S

Selby, D. A., Jakhmola, R., Sprang, M., Gro mann, G., Raki, H., Maani, N., Pavliuk, D., Ewald, J., and Vollmer, S. (2025). Visible neural networks for multi-omics integration: a critical review. Frontiers in Artificial Intelligence , 8:1595291

2025
[38]

Seninge, L., Anastopoulos, I., Ding, H., and Stuart, J. (2021). Vega is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nature communications , 12(1):5684

2021
[39]

Simidjievski, N., Bodnar, C., Tariq, I., Scherer, P., Andres Terre, H., Shams, Z., Jamnik, M., and Li \`o , P. (2019). Variational autoencoders for cancer data integration: design principles and computational practice. Frontiers in genetics , 10:1205

2019
[40]

Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical B ayesian optimization of machine learning algorithms. In Proceedings of the 25th International Conference on Advances in Neural Information Processing Systems (NeurIPS'12)

2012
[41]

Tiao, L., Klein, A., Seeger, M., Archambeau, C., Bonilla, E., and Ramos, F. (2020). Bayesian optimization by density ratio estimation

2020
[42]

N., Collisson, E

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics , 45(10):1113--1120

2013
[43]

S., and Pollard, K

Whalen, S., Schreiber, J., Noble, W. S., and Pollard, K. S. (2022). Navigating the pitfalls of applying machine learning in genomics. Nature Reviews Genetics , 23(3):169--181

2022
[44]

Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015a). Learning hyperparameter optimization initializations. In 2015 IEEE international conference on data science and advanced analytics (DSAA) , pages 1--10. IEEE

2015
[45]

Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015b). Sequential model-free hyperparameter tuning. In 2015 IEEE international conference on data mining , pages 1033--1038. IEEE

2015
[46]

D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., Shivashankar, G., and Uhler, C

Yang, K. D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., Shivashankar, G., and Uhler, C. (2021). Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature communications , 12(1):31

2021
[47]

M., Coleman, C., Clarence, T., Latouche, O., Tsankova, N

Zhu, K., Bendl, J., Rahman, S., Vicari, J. M., Coleman, C., Clarence, T., Latouche, O., Tsankova, N. M., Li, A., Brennand, K. J., et al. (2023). Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Science advances , 9(41):eadg3754

2023
[48]

Zimmer, L., Lindauer, M., and Hutter, F. (2021). Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl. IEEE transactions on pattern analysis and machine intelligence , 43(9):3079--3090

2021

[1] [1]

Bergstra, J., Bardenet, R., Bengio, Y., and K \'e gl, B. (2011). Algorithms for hyper-parameter optimization. Advances in neural information processing systems , 24

2011

[2] [2]

and Bengio, Y

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research , 13(2)

2012

[3] [3]

Bischl, B., Binder, M., Lang, M., Pielok, T., Richter, J., Coors, S., Thomas, J., Ullmann, T., Becker, M., Boulesteix, A.-L., et al. (2023). Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery , 13(2):e1484

2023

[4] [4]

T., Li, X., Grosse, R

Chen, R. T., Li, X., Grosse, R. B., and Duvenaud, D. K. (2018). Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems , 31

2018

[5] [5]

and Herrmann, C

Doncevic, D. and Herrmann, C. (2023). Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations. Bioinformatics , 39(6):btad387

2023

[6] [6]

and Yang, Y

Dong, X. and Yang, Y. (2020). Nas-bench-201: Extending the scope of reproducible neural architecture search. arXiv preprint arXiv:2001.00326

arXiv 2020

[7] [7]

Eggensperger, K., Hutter, F., Hoos, H., and Leyton-Brown, K. (2015). Efficient benchmarking of hyperparameter optimizers via surrogates. In Proceedings of the AAAI conference on artificial intelligence , volume 29

2015

[8] [8]

M., Mircea, M., Mueller, N

Eraslan, G., Simon, L. M., Mircea, M., Mueller, N. S., and Theis, F. J. (2019). Single-cell rna-seq denoising using a deep count autoencoder. Nature communications , 10(1):390

2019

[9] [9]

Ewald, J. (2025). Autoencodix raw data for reproducibility. https://doi.org/10.5281/zenodo.15518831

work page doi:10.5281/zenodo.15518831 2025

[10] [10]

Falkner, S., Klein, A., and Hutter, F. (2018). Bohb: Robust and efficient hyperparameter optimization at scale. In International conference on machine learning , pages 1437--1446. PMLR

2018

[11] [11]

Feurer, M., Springenberg, J., and Hutter, F. (2015). Initializing bayesian hyperparameter optimization via meta-learning. In Proceedings of the AAAI conference on artificial intelligence , volume 29

2015

[12] [12]

Fisher, A., Rudin, C., and Dominici, F. (2019). All models are wrong, but many are useful: Learning a variable's importance by studying an entire class of prediction models simultaneously. Journal of Machine Learning Research , 20(177):1--81

2019

[13] [13]

Franceschi, L., Donini, M., Perrone, V., Klein, A., Archambeau, C., Seeger, M., Pontil, M., and Frasconi, P. (2025). Hyperparameter optimization in machine learning. arXiv:2410.22854 [stat.ML]

arXiv 2025

[14] [14]

Garnett, R. (2023). Bayesian Optimization . Cambridge University Press

2023

[15] [15]

Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., and Lerchner, A. (2017). beta- VAE : Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR'17)

2017

[16] [16]

and Greene, C

Hu, Q. and Greene, C. S. (2018). Parameter tuning is a key part of dimensionality reduction via deep variational autoencoders for single cell rna transcriptomics. In BIOCOMPUTING 2019: proceedings of the Pacific symposium , pages 362--373. World Scientific

2018

[17] [17]

and Talwalkar, A

Jamieson, K. and Talwalkar, A. (2016). Non-stochastic best arm identification and hyperparameter optimization. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS'16)

2016

[18] [18]

J., Jurenaite, N., Pra s c evi \'c , D., Scherf, N., and Ewald, J

Joas, M. J., Jurenaite, N., Pra s c evi \'c , D., Scherf, N., and Ewald, J. (2025). Autoencodix: a generalized and versatile framework to train and evaluate autoencoders for biological representation learning and beyond. Nature Computational Science , pages 1--13

2025

[19] [19]

and Hutter, F

Klein, A. and Hutter, F. (2019). Tabular benchmarks for joint architecture and hyperparameter optimization. arXiv preprint arXiv:1905.04970

Pith/arXiv arXiv 2019

[20] [20]

and Claassen, M

Kopf, A. and Claassen, M. (2021). Latent representation learning in biology and translational medicine. Patterns , 2(3)

2021

[21] [21]

Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., and Talwalkar, A. (2018). Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of machine learning research , 18(185):1--52

2018

[22] [22]

Li, L., Jamieson, K., Rostamizadeh, A., Gonina, E., Ben-Tzur, J., Hardt, M., Recht, B., and Talwalkar, A. (2020). A system for massively parallel hyperparameter tuning. Proceedings of machine learning and systems , 2:230--246

2020

[23] [23]

Locatello, F., Bauer, S., Lucic, M., Raetsch, G., Gelly, S., Sch \"o lkopf, B., and Bachem, O. (2019). Challenging common assumptions in the unsupervised learning of disentangled representations. In international conference on machine learning , pages 4114--4124. PMLR

2019

[24] [24]

L., Srivatsan, S

Lotfollahi, M., Klimovskaia Susmelj, A., De Donno, C., Hetzel, L., Ji, Y., Ibarra, I. L., Srivatsan, S. R., Naghipourfar, M., Daza, R. M., Martin, B., et al. (2023a). Predicting cellular responses to complex perturbations in high-throughput screens. Molecular systems biology , 19(6):MSB202211517

[25] [25]

V., and Theis, F

Lotfollahi, M., Rybakov, S., Hrovatin, K., Hediyeh-Zadeh, S., Talavera-L \'o pez, C., Misharin, A. V., and Theis, F. J. (2023b). Biologically informed deep learning to query gene programs in single-cell atlases. Nature Cell Biology , 25(2):337--350

[26] [26]

Mamoshina, P., Vieira, A., Putin, E., and Zhavoronkov, A. (2016). Applications of deep learning in biomedicine. Molecular pharmaceutics , 13(5):1445--1454

2016

[27] [27]

Mardis, E. R. (2008). The impact of next-generation sequencing technology on genetics. Trends in genetics , 24(3):133--141

2008

[28] [28]

Milacic, M., Beavers, D., Conley, P., Gong, C., Gillespie, M., Griss, J., Haw, R., Jassal, B., Matthews, L., May, B., et al. (2024). The reactome pathway knowledgebase 2024. Nucleic acids research , 52(D1):D672--D678

2024

[29] [29]

Ovcharenko, O., Barkmann, F., Toma, P., Daunhawer, I., Vogt, J., Schelter, S., and Boeva, V. (2025). Scssl-bench: Benchmarking self-supervised learning for single-cell data. arXiv preprint arXiv:2506.10031

arXiv 2025

[30] [30]

W., Archambeau, C., and Jenatton, R

Perrone, V., Shen, H., Seeger, M. W., Archambeau, C., and Jenatton, R. (2019). Learning search spaces for bayesian optimization: Another view of hyperparameter transfer learning. Advances in neural information processing systems , 32

2019

[31] [31]

Real, E., Aggarwal, A., Huang, Y., and Le, Q. V. (2019). Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence , volume 33, pages 4780--4789

2019

[32] [32]

A., Spacek, D

Reuter, J. A., Spacek, D. V., and Snyder, M. P. (2015). High-throughput sequencing technologies. Molecular cell , 58(4):586--597

2015

[33] [33]

and Erickson, N

Salinas, D. and Erickson, N. (2023). Tabrepo: A large scale repository of tabular model evaluations and its automl applications. arXiv preprint arXiv:2311.02971

arXiv 2023

[34] [34]

Salinas, D., Golebiowski, J., Klein, A., Seeger, M., and Archambeau, C. (2023). Optimizing hyperparameters with conformal quantile regression. In International Conference on Machine Learning , pages 29876--29893. PMLR

2023

[35] [35]

Salinas, D., Seeger, M., Klein, A., Perrone, V., Wistuba, M., and Archambeau, C. (2022). Syne tune: A library for large scale hyperparameter tuning and reproducible research. In International Conference on Automated Machine Learning , pages 16--1. PMLR

2022

[36] [36]

Salinas, D., Shen, H., and Perrone, V. (2020). A quantile-based approach for hyperparameter transfer learning. In International conference on machine learning , pages 8438--8448. PMLR

2020

[37] [37]

A., Jakhmola, R., Sprang, M., Gro mann, G., Raki, H., Maani, N., Pavliuk, D., Ewald, J., and Vollmer, S

Selby, D. A., Jakhmola, R., Sprang, M., Gro mann, G., Raki, H., Maani, N., Pavliuk, D., Ewald, J., and Vollmer, S. (2025). Visible neural networks for multi-omics integration: a critical review. Frontiers in Artificial Intelligence , 8:1595291

2025

[38] [38]

Seninge, L., Anastopoulos, I., Ding, H., and Stuart, J. (2021). Vega is an interpretable generative model for inferring biological network activity in single-cell transcriptomics. Nature communications , 12(1):5684

2021

[39] [39]

Simidjievski, N., Bodnar, C., Tariq, I., Scherer, P., Andres Terre, H., Shams, Z., Jamnik, M., and Li \`o , P. (2019). Variational autoencoders for cancer data integration: design principles and computational practice. Frontiers in genetics , 10:1205

2019

[40] [40]

Snoek, J., Larochelle, H., and Adams, R. P. (2012). Practical B ayesian optimization of machine learning algorithms. In Proceedings of the 25th International Conference on Advances in Neural Information Processing Systems (NeurIPS'12)

2012

[41] [41]

Tiao, L., Klein, A., Seeger, M., Archambeau, C., Bonilla, E., and Ramos, F. (2020). Bayesian optimization by density ratio estimation

2020

[42] [42]

N., Collisson, E

Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013). The cancer genome atlas pan-cancer analysis project. Nature genetics , 45(10):1113--1120

2013

[43] [43]

S., and Pollard, K

Whalen, S., Schreiber, J., Noble, W. S., and Pollard, K. S. (2022). Navigating the pitfalls of applying machine learning in genomics. Nature Reviews Genetics , 23(3):169--181

2022

[44] [44]

Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015a). Learning hyperparameter optimization initializations. In 2015 IEEE international conference on data science and advanced analytics (DSAA) , pages 1--10. IEEE

2015

[45] [45]

Wistuba, M., Schilling, N., and Schmidt-Thieme, L. (2015b). Sequential model-free hyperparameter tuning. In 2015 IEEE international conference on data mining , pages 1033--1038. IEEE

2015

[46] [46]

D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., Shivashankar, G., and Uhler, C

Yang, K. D., Belyaeva, A., Venkatachalapathy, S., Damodaran, K., Katcoff, A., Radhakrishnan, A., Shivashankar, G., and Uhler, C. (2021). Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature communications , 12(1):31

2021

[47] [47]

M., Coleman, C., Clarence, T., Latouche, O., Tsankova, N

Zhu, K., Bendl, J., Rahman, S., Vicari, J. M., Coleman, C., Clarence, T., Latouche, O., Tsankova, N. M., Li, A., Brennand, K. J., et al. (2023). Multi-omic profiling of the developing human cerebral cortex at the single-cell level. Science advances , 9(41):eadg3754

2023

[48] [48]

Zimmer, L., Lindauer, M., and Hutter, F. (2021). Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl. IEEE transactions on pattern analysis and machine intelligence , 43(9):3079--3090

2021