A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

Lican Huang; Rui Huang

arxiv: 2605.21528 · v1 · pith:M7NF5ZRKnew · submitted 2026-05-19 · 💻 cs.LG · cs.AI

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

Rui Huang , Lican Huang This is my paper

Pith reviewed 2026-05-22 01:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords AutoMLHealthcare Risk PredictionPipeline OptimizationReproducibilityClass ImbalanceLog-driven AnalysisComponent RedundancyInterpretability

0 comments

The pith

A log-driven AutoML framework reveals that healthcare risk prediction pipelines depend on a small set of interacting components.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a deterministic framework that encodes every pipeline as a traceable log entity to support full reproducibility and component-level analysis. Experiments running more than 18,000 configurations on the Pima Indians Diabetes and Stroke datasets show the search space is structured and partially redundant, with performance driven mainly by augmentation, model choice, and imbalance handling. A sympathetic reader would care because this points to a practical way to simplify AutoML searches in medical settings where data are scarce and imbalanced. If correct, the work implies that exhaustive pipeline trials can be replaced by targeted focus on a few high-impact choices without large losses in performance.

Core claim

By treating each pipeline as a traceable log entity inside the yvsoucom-iterkit framework, analysis of over 18,000 configurations on the Pima Indians Diabetes and Stroke datasets shows a structured and partially redundant search space in which performance is governed by a small subset of interacting components; Random Forest importance ranks augmentation at 0.454 and model choice at 0.198 on Pima while imbalance handling reaches 0.406 on Stroke, and similarity metrics quantify redundancies such as biMax-biMean feature selection (RMS distance 0.0252) and mixup versus no augmentation (0.0279).

What carries the argument

The traceable log entity that records every pipeline configuration, enabling attribution of performance to individual components, measurement of their interactions and similarities, and assessment of cross-seed robustness.

If this is right

Effective AutoML optimization can concentrate on a reduced set of high-impact components instead of exploring the full space.
Ensemble models deliver stable high Weighted-F1 scores (0.89 on Pima, 0.94 on Stroke) with lower cross-seed variability than alternatives such as SVM.
Many component variants exhibit strong redundancy, including specific augmentation methods that perform nearly identically to no augmentation.
Macro-F1 remains limited on severely imbalanced data like Stroke even when Weighted-F1 is high.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar log-based redundancy analysis could be applied to other clinical prediction tasks to discover domain-specific shortcuts.
The framework's built-in traceability may help meet regulatory demands for documented model construction in healthcare.
The observed performance-robustness trade-off suggests ensembles as a default choice when reproducibility across random seeds matters.
Future experiments could deliberately prune the redundant components identified here and measure any drop in final accuracy.

Load-bearing premise

The two chosen datasets together with the 18,000 sampled configurations represent the broader space of healthcare risk prediction tasks well enough for the observed component importance rankings to generalize.

What would settle it

Re-running the identical log-driven framework on a new healthcare dataset such as heart-disease prediction and finding that the top-ranked components shift away from augmentation and imbalance handling.

Figures

Figures reproduced from arXiv: 2605.21528 by Lican Huang, Rui Huang.

**Figure 1.** Figure 1: System architecture of the proposed AutoML framework. The framework follows a centralized configuration and deterministic pipeline enumeration strategy, enabling parallel execution and reproducible large-scale experimentation. verification of pipeline behavior. By maintaining access to both intermediate and final outputs, the framework enables fine-grained inspection of model performance beyond aggregate … view at source ↗

**Figure 2.** Figure 2: Random Forest-based component importance for the Pima dataset. Exact importance values are provided in the experimental logs. Random Forest importance reflects sensitivity of performance to component perturbations rather than direct performance improvement or degradation. Component-Specific Analysis. The unified heatmaps (Figure Supplementary C.3 and the Pima and Stroke Supplementary Materials) provide … view at source ↗

**Figure 3.** Figure 3: Detailed Random Forest importance analysis for the Pima dataset [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: pima unified metric mega heat map for branches 4.4.4. Value-Level Component Similarity Analysis Tables Supplementary D.7 summarize the value-level similarity analysis for the Pima dataset. Ranking value-level RMS similarities reveals structured redundancy patterns across pipeline components. Similarity-Based Insights. Feature Dimensionality: Configurations with 4–8 selected features form a compact similar… view at source ↗

**Figure 5.** Figure 5: pima cluster10 part vs part heat map around 0.81, indicating relatively strong but more variable interactions compared to the Pima dataset. These analyses help identify groups of preprocessing, augmentation, normalization, and modeling components that tend to co-vary in their effects. By focusing on subsets of highimpact or highly variable components, we reduce complexity while highlighting the most influ… view at source ↗

**Figure 6.** Figure 6: Pima cross-seed mean performance and standard deviation across random seeds. Performance and Stability Analysis Under Stochastic Variation. Figure Supplementary F.16 illustrates the joint distribution of mean Macro-F1 and standard deviation across models, highlighting the trade-off between predictive performance and stability. SVM achieves the highest Macro-F1 (0.689) but also exhibits the highest variabi… view at source ↗

read the original abstract

Accurate and reproducible disease risk prediction remains challenging due to heterogeneous features, limited samples, and severe class imbalance. This study introduces yvsoucom-iterkit, a deterministic and log-driven automated machine learning framework that formulates pipeline optimization as a fully reproducible, configuration-level system. Each pipeline is encoded as a traceable log entity, enabling analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured and partially redundant search space, where performance is governed by a small subset of interacting components. Random Forest importance analysis identifies augmentation (0.454), model choice (0.198), and imbalance handling (0.101) as key drivers on Pima, while imbalance handling dominates Stroke (0.406). Component similarity analysis shows strong redundancy, with feature selection variants (biMax-biMean) exhibiting low RMS distance (0.0252), mixup closely matching no augmentation (0.0279), and TomekLinks aligning with no imbalance handling (0.0325), whereas Gaussian noise shows greater divergence from no augmentation (0.10). The framework achieves strong and stable performance using ensemble models (Weighted-F1 0.89, Macro-F1 0.88 on Pima; Weighted-F1 0.94 on Stroke), while Macro-F1 remains lower on Stroke (0.67) due to class imbalance. Cross-seed analysis reveals a performance-robustness trade-off, with ensembles showing lower variability (0.023-0.026) than SVM. These results indicate that effective AutoML optimization can focus on a reduced set of high-impact components.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper adds log-based traceability to AutoML for healthcare pipelines and documents redundancy across 18k runs, but the importance rankings shift between the two similar datasets tested.

read the letter

The main thing to know is that this work turns AutoML pipeline search into a logged, analyzable process and uses the results to point out redundancy among components on two healthcare datasets. They encode each pipeline as a traceable log entity, which supports straightforward attribution of performance to specific choices, similarity checks between variants, and seed-robustness tests. Running more than 18,000 configurations on the Pima and Stroke data, then fitting Random Forest models for importance and computing RMS distances for similarity, produces clear numbers on what matters and what overlaps. The deterministic setup and the observation that some variants like mixup sit close to no augmentation are practical for anyone trying to shrink search spaces without losing much performance. The ensemble results with low variability also line up with the reported metrics. The soft spots sit in the scope and the data choice. Both datasets are small binary classification problems with comparable imbalance, and the top drivers already differ: augmentation ranks highest on Pima while imbalance handling ranks highest on Stroke. That undercuts the broader claim of a generally small subset of interacting components that governs healthcare risk pipelines. The sampling method for the 18,000 configurations is not detailed, which leaves open whether the observed structure covers the space evenly. This is aimed at people building or using AutoML tools for medical prediction who want built-in reproducibility and post-hoc explanations. A reader focused on pipeline optimization or healthcare ML could use the redundancy patterns and the log-entity idea directly. It has enough experimental scale and internal consistency to deserve a serious referee, though the generalizability of the rankings would need more datasets to hold up. I would send it for peer review with a request to add diverse tasks and clarify the configuration sampling.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces yvsoucom-iterkit, a deterministic log-driven AutoML framework that encodes each pipeline as a traceable log entity to support reproducible optimization and post-hoc interpretability analysis (component attribution, interactions, similarity, and cross-seed robustness). Experiments across more than 18,000 configurations on the Pima Indians Diabetes and Stroke datasets are used to argue that the pipeline search space is structured and partially redundant, with performance governed by a small subset of interacting components; Random Forest importance identifies augmentation (0.454) and model choice (0.198) as top drivers on Pima while imbalance handling (0.406) dominates on Stroke, and RMS-distance similarity analysis quantifies redundancies such as biMax-biMean (0.0252) and mixup vs. none (0.0279). Ensemble models achieve stable Weighted-F1 scores of 0.89–0.94.

Significance. If the central empirical claims hold, the work provides a concrete, log-centric methodology for making AutoML pipelines both reproducible and interpretable in healthcare settings, where the ability to trace and reduce the effective search space to a small set of high-impact components could be practically useful. The explicit quantification of component redundancy via RMS distances and the reporting of cross-seed variability constitute strengths that go beyond typical black-box AutoML results.

major comments (2)

[Abstract] Abstract: the central claim that 'performance is governed by a small subset of interacting components' rests on experiments performed on only two binary, imbalanced classification datasets whose top importance rankings already diverge (augmentation 0.454 on Pima vs. imbalance handling 0.406 on Stroke); this divergence indicates that the identified high-impact subset may be an artifact of the chosen data distributions rather than a general property of healthcare risk pipelines.
[Abstract] Abstract: the manuscript provides no description of the sampling strategy used to generate the 18,000 pipeline configurations or of any post-hoc filtering; without this information the Random Forest importance scores and the RMS-distance redundancy results cannot be reliably interpreted as evidence of an intrinsically structured search space.

minor comments (2)

[Abstract] Abstract: the framework name 'yvsoucom-iterkit' is introduced without explanation of its construction or intended meaning.
[Abstract] Abstract: the reported cross-seed variability (0.023–0.026) should explicitly state the underlying performance metric (e.g., standard deviation of Weighted-F1 across seeds).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each major comment below, indicating where revisions will be made to improve clarity and completeness without altering the core claims or experimental results.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'performance is governed by a small subset of interacting components' rests on experiments performed on only two binary, imbalanced classification datasets whose top importance rankings already diverge (augmentation 0.454 on Pima vs. imbalance handling 0.406 on Stroke); this divergence indicates that the identified high-impact subset may be an artifact of the chosen data distributions rather than a general property of healthcare risk pipelines.

Authors: The manuscript already reports the dataset-specific rankings (augmentation highest on Pima, imbalance handling highest on Stroke) to illustrate that the governing components can vary while still forming a small subset. This supports the claim of a structured search space rather than contradicting it. We acknowledge the limitation of using only two datasets and will revise the abstract and add a paragraph in the discussion to explicitly state that the findings apply to the evaluated healthcare risk prediction tasks on these datasets. The framework itself is presented as a general tool for identifying such subsets per task. No new experiments are feasible at this stage, but the scope will be clarified. revision: partial
Referee: [Abstract] Abstract: the manuscript provides no description of the sampling strategy used to generate the 18,000 pipeline configurations or of any post-hoc filtering; without this information the Random Forest importance scores and the RMS-distance redundancy results cannot be reliably interpreted as evidence of an intrinsically structured search space.

Authors: We agree that a description of how the configurations were generated is necessary for interpreting the importance and redundancy analyses. We will add a dedicated subsection to the Methods section detailing the sampling strategy: the 18,000+ configurations were produced by systematically enumerating all valid combinations of the pipeline components (augmentation variants, imbalance handlers, feature selectors, and models) within the framework's defined search space, with no post-hoc filtering applied. This revision will allow readers to assess the results as evidence of structure in the explored space. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results from independent pipeline runs

full rationale

The paper introduces a logging framework for AutoML pipelines and then executes >18,000 configurations on two fixed datasets, recording performance metrics. Component importances and redundancy are computed post-hoc via Random Forest fits and RMS distances on those observed outcomes. No step reduces a claimed result to a fitted parameter by construction, nor does any self-citation or definitional loop carry the central claim about search-space structure. The findings rest on externally generated experimental data rather than on the framework's own definitions being re-used as both input and output. This is a standard reproducible empirical study whose derivation chain remains self-contained against the reported runs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard supervised learning assumptions plus the modeling choice that exhaustive enumeration of 18,000 configurations adequately probes the space; no new physical entities are postulated.

free parameters (1)

Pipeline configuration choices
Hyperparameters and component selections across augmentation, model, and imbalance modules are optimized over the enumerated space.

axioms (1)

domain assumption Random Forest feature importance reliably ranks the contribution of pipeline components to final performance
Invoked when reporting augmentation (0.454) and other importance values on the Pima dataset.

pith-pipeline@v0.9.0 · 5843 in / 1409 out tokens · 52481 ms · 2026-05-22T01:28:02.358065+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured and partially redundant search space, where performance is governed by a small subset of interacting components. Random Forest importance analysis identifies augmentation (0.454), model choice (0.198), and imbalance handling (0.101) as key drivers on Pima.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Component similarity analysis shows strong redundancy, with feature selection variants (biMax-biMean) exhibiting low RMS distance (0.0252), mixup closely matching no augmentation (0.0279).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machinelearningtrainingdata.ACMSIGKDDexplorationsnewsletter, 6(1):20–29, 2004

work page 2004
[2]

Hyperparameter optimization: Foun- dations, algorithms, best practices, and open challenges.Wiley InterdisciplinaryReviews:DataMiningandKnowledgeDiscovery,13 (2):e1484, 2023

Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, et al. Hyperparameter optimization: Foun- dations, algorithms, best practices, and open challenges.Wiley InterdisciplinaryReviews:DataMiningandKnowledgeDiscovery,13 (2):e1484, 2023. R. Huang et al.:Prepri...

work page 2023
[3]

Random forests.Machine Learning, 45(1):5–32, 2001

Leo Breiman. Random forests.Machine Learning, 45(1):5–32, 2001

work page 2001
[4]

A survey on feature selection methods.Computers & Electrical Engineering, 40(1):16–28, 2014

Girish Chandrashekar and Ferat Sahin. A survey on feature selection methods.Computers & Electrical Engineering, 40(1):16–28, 2014. ISSN 0045-7906. doi: 10.1016/j.compeleceng.2013.11.024. URL https://www.sciencedirect.com/science/article/pii/S0045790613003

work page doi:10.1016/j.compeleceng.2013.11.024 2014
[5]

40th-year commemorative issue

work page
[6]

Chawla, Kevin W

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002

work page 2002
[7]

Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International ConferenceonKnowledgeDiscoveryandDataMining,pages785–794, 2016

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International ConferenceonKnowledgeDiscoveryandDataMining,pages785–794, 2016

work page 2016
[8]

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pierre Larroy, Mu Li, and Alex Smola. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003
[9]

Efficient and robust automated machine learning.Advances in neural information processing systems, 28, 2015

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springen- berg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning.Advances in neural information processing systems, 28, 2015

work page 2015
[10]

Auto-sklearn 2.0: Hands-free automl via meta-learning.Journal of Machine Learning Research, 23(261): 1–61, 2022

Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. Auto-sklearn 2.0: Hands-free automl via meta-learning.Journal of Machine Learning Research, 23(261): 1–61, 2022

work page 2022
[11]

Why do tree-based models still outperform deep learning on typical tabular data?Advancesinneuralinformationprocessingsystems,35:507–520, 2022

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?Advancesinneuralinformationprocessingsystems,35:507–520, 2022

work page 2022
[12]

Springer Berlin Heidelberg, Berlin, Heidelberg, 2006

Isabelle Guyon and André Elisseeff.An Introduction to Feature Ex- traction, pages 1–25. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006

work page 2006
[13]

Transparency and reproducibility in artificial intelligence.Nature, 586(7829):E14– E16, 2020

Benjamin Haibe-Kains, George Alexandru Adam, et al. Transparency and reproducibility in artificial intelligence.Nature, 586(7829):E14– E16, 2020

work page 2020
[14]

Adasyn: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE worldcongressoncomputationalintelligence),pages1322–1328.Ieee, 2008

work page 2008
[15]

Knowledge- based systems212, 106622 (2021)

Xin He, Kaiyong Zhao, and Xiaowen Chu. Automl: A survey of the state-of-the-art.Knowledge-Based Systems, 212:106622, 2021. ISSN 0950-7051. doi: https://doi.org/10.1016/j.knosys.2020.106622. URL https://www.sciencedirect.com/science/article/pii/S0950705120307 516

work page doi:10.1016/j.knosys.2020.106622 2021
[16]

Anautomlsystemforimprovingdiabetes predictionbyauto-optimizationofpreprocessingandmachinelearning models

RuiHuangandLicanHuang. Anautomlsystemforimprovingdiabetes predictionbyauto-optimizationofpreprocessingandmachinelearning models. SSRN preprint, 2026. Available athttps://ssrn.com/abstrac t=5898268orhttp://dx.doi.org/10.2139/ssrn.5898268

work page doi:10.2139/ssrn.5898268 2026
[17]

Springer Nature, 2019

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren.Automated machine learning: methods, systems, challenges. Springer Nature, 2019

work page 2019
[18]

H2o automl: Scalable automatic machine learning

Erin LeDell and Sebastien Poirier. H2o automl: Scalable automatic machine learning. InProceedings of the AutoML Workshop at ICML, 2020

work page 2020
[19]

Feature selection: A data perspective.ACM computing surveys (CSUR), 50(6):1–45, 2017

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. Feature selection: A data perspective.ACM computing surveys (CSUR), 50(6):1–45, 2017

work page 2017
[20]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017
[21]

Automatingbiomedical data science through tree-based pipeline optimization

Randal S Olson, Ryan J Urbanowicz, Peter C Andrews, Nicole A Lavender,LaCreisKidd,andJasonHMoore. Automatingbiomedical data science through tree-based pipeline optimization. InEuropean conference on the applications of evolutionary computation, pages 123–137. Springer, 2016

work page 2016
[22]

Abhilash Pati, Manoranjan Parhi, and Binod Kumar Pattanayak. A review on prediction of diabetes using machine learning and data mining classification techniques.International Journal of Biomedical Engineering and Technology, 41(1):83–109, 2023

work page 2023
[23]

Performance analysis of naive bayes and j48 classification algorithm for data classification

Tina R Patil and Swati Sunil Sherekar. Performance analysis of naive bayes and j48 classification algorithm for data classification. International journal of computer science and applications, 6(2):256– 261, 2013

work page 2013
[24]

Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program)

Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). Journal of machine learning research, 22(164):1–20, 2021

work page 2019
[25]

A review of feature selection techniques in bioinformatics.bioinformatics, 23(19):2507– 2517, 2007

Yvan Saeys, Inaki Inza, and Pedro Larranaga. A review of feature selection techniques in bioinformatics.bioinformatics, 23(19):2507– 2517, 2007

work page 2007
[26]

Diagnosis of diabetes type-ii using hybrid machine learning based ensemble model.International Journal of Information Technology, 12(2):419–428, 2020

Abid Sarwar, Mehbob Ali, Jatinder Manhas, and Vinod Sharma. Diagnosis of diabetes type-ii using hybrid machine learning based ensemble model.International Journal of Information Technology, 12(2):419–428, 2020

work page 2020
[27]

Benjamin Shickel, Patrick James Tighe, Azra Bihorac, and Parisa Rashidi. Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis.IEEE journal of biomedical and health informatics, 22(5):1589–1604, 2017

work page 2017
[28]

M. S. Singh, K. Thongam, P. Choudhary, et al. Stroke risk prediction and prevention: Traditional versus machine learning approaches. Archives of Computational Methods in Engineering, 2025. doi: 10.1007/s11831-025-10406-5

work page doi:10.1007/s11831-025-10406-5 2025
[29]

Auto-weka: Combined selection and hyperparameter opti- mization of classification algorithms

Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton- Brown. Auto-weka: Combined selection and hyperparameter opti- mization of classification algorithms. InProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 847–855, 2013

work page 2013
[30]

Twomodificationsofcnn.IEEETransactionsonSystems, Man, and Cybernetics, SMC-6(11):769–772, 1976

IvanTomek. Twomodificationsofcnn.IEEETransactionsonSystems, Man, and Cybernetics, SMC-6(11):769–772, 1976

work page 1976
[31]

mixup: Beyond empirical risk minimization

HongyiZhang,MoustaphaCisse,YannN.Dauphin,andDavidLopez- Paz. mixup: Beyond empirical risk minimization. InInternational Conference on Learning Representations (ICLR), pages 1–13, 2018

work page 2018
[32]

Marc-André Zöller and Marco F. Huber. Benchmark and survey of automated machine learning frameworks.Journal of Artificial Intelligence Research, 70:409–472, 2021. R. Huang et al.:Preprint submitted to ElsevierPage 20 of 41 Log-Driven Reproducible AutoML for Healthcare Supplementary Materials Overview This document presents thecomprehensive set of experime...

work page arXiv 2021

[1] [1]

Gustavo E. A. P. A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machinelearningtrainingdata.ACMSIGKDDexplorationsnewsletter, 6(1):20–29, 2004

work page 2004

[2] [2]

Hyperparameter optimization: Foun- dations, algorithms, best practices, and open challenges.Wiley InterdisciplinaryReviews:DataMiningandKnowledgeDiscovery,13 (2):e1484, 2023

Bernd Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, Marc Becker, Anne-Laure Boulesteix, et al. Hyperparameter optimization: Foun- dations, algorithms, best practices, and open challenges.Wiley InterdisciplinaryReviews:DataMiningandKnowledgeDiscovery,13 (2):e1484, 2023. R. Huang et al.:Prepri...

work page 2023

[3] [3]

Random forests.Machine Learning, 45(1):5–32, 2001

Leo Breiman. Random forests.Machine Learning, 45(1):5–32, 2001

work page 2001

[4] [4]

A survey on feature selection methods.Computers & Electrical Engineering, 40(1):16–28, 2014

Girish Chandrashekar and Ferat Sahin. A survey on feature selection methods.Computers & Electrical Engineering, 40(1):16–28, 2014. ISSN 0045-7906. doi: 10.1016/j.compeleceng.2013.11.024. URL https://www.sciencedirect.com/science/article/pii/S0045790613003

work page doi:10.1016/j.compeleceng.2013.11.024 2014

[5] [5]

40th-year commemorative issue

work page

[6] [6]

Chawla, Kevin W

Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002

work page 2002

[7] [7]

Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International ConferenceonKnowledgeDiscoveryandDataMining,pages785–794, 2016

Tianqi Chen and Carlos Guestrin. Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International ConferenceonKnowledgeDiscoveryandDataMining,pages785–794, 2016

work page 2016

[8] [8]

AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pierre Larroy, Mu Li, and Alex Smola. Autogluon-tabular: Robust and accurate automl for structured data.arXiv preprint arXiv:2003.06505, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2003

[9] [9]

Efficient and robust automated machine learning.Advances in neural information processing systems, 28, 2015

Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springen- berg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning.Advances in neural information processing systems, 28, 2015

work page 2015

[10] [10]

Auto-sklearn 2.0: Hands-free automl via meta-learning.Journal of Machine Learning Research, 23(261): 1–61, 2022

Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, and Frank Hutter. Auto-sklearn 2.0: Hands-free automl via meta-learning.Journal of Machine Learning Research, 23(261): 1–61, 2022

work page 2022

[11] [11]

Why do tree-based models still outperform deep learning on typical tabular data?Advancesinneuralinformationprocessingsystems,35:507–520, 2022

Léo Grinsztajn, Edouard Oyallon, and Gaël Varoquaux. Why do tree-based models still outperform deep learning on typical tabular data?Advancesinneuralinformationprocessingsystems,35:507–520, 2022

work page 2022

[12] [12]

Springer Berlin Heidelberg, Berlin, Heidelberg, 2006

Isabelle Guyon and André Elisseeff.An Introduction to Feature Ex- traction, pages 1–25. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006

work page 2006

[13] [13]

Transparency and reproducibility in artificial intelligence.Nature, 586(7829):E14– E16, 2020

Benjamin Haibe-Kains, George Alexandru Adam, et al. Transparency and reproducibility in artificial intelligence.Nature, 586(7829):E14– E16, 2020

work page 2020

[14] [14]

Adasyn: Adaptive synthetic sampling approach for imbalanced learning

Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE worldcongressoncomputationalintelligence),pages1322–1328.Ieee, 2008

work page 2008

[15] [15]

Knowledge- based systems212, 106622 (2021)

Xin He, Kaiyong Zhao, and Xiaowen Chu. Automl: A survey of the state-of-the-art.Knowledge-Based Systems, 212:106622, 2021. ISSN 0950-7051. doi: https://doi.org/10.1016/j.knosys.2020.106622. URL https://www.sciencedirect.com/science/article/pii/S0950705120307 516

work page doi:10.1016/j.knosys.2020.106622 2021

[16] [16]

Anautomlsystemforimprovingdiabetes predictionbyauto-optimizationofpreprocessingandmachinelearning models

RuiHuangandLicanHuang. Anautomlsystemforimprovingdiabetes predictionbyauto-optimizationofpreprocessingandmachinelearning models. SSRN preprint, 2026. Available athttps://ssrn.com/abstrac t=5898268orhttp://dx.doi.org/10.2139/ssrn.5898268

work page doi:10.2139/ssrn.5898268 2026

[17] [17]

Springer Nature, 2019

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren.Automated machine learning: methods, systems, challenges. Springer Nature, 2019

work page 2019

[18] [18]

H2o automl: Scalable automatic machine learning

Erin LeDell and Sebastien Poirier. H2o automl: Scalable automatic machine learning. InProceedings of the AutoML Workshop at ICML, 2020

work page 2020

[19] [19]

Feature selection: A data perspective.ACM computing surveys (CSUR), 50(6):1–45, 2017

Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. Feature selection: A data perspective.ACM computing surveys (CSUR), 50(6):1–45, 2017

work page 2017

[20] [20]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. InProceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017

[21] [21]

Automatingbiomedical data science through tree-based pipeline optimization

Randal S Olson, Ryan J Urbanowicz, Peter C Andrews, Nicole A Lavender,LaCreisKidd,andJasonHMoore. Automatingbiomedical data science through tree-based pipeline optimization. InEuropean conference on the applications of evolutionary computation, pages 123–137. Springer, 2016

work page 2016

[22] [22]

Abhilash Pati, Manoranjan Parhi, and Binod Kumar Pattanayak. A review on prediction of diabetes using machine learning and data mining classification techniques.International Journal of Biomedical Engineering and Technology, 41(1):83–109, 2023

work page 2023

[23] [23]

Performance analysis of naive bayes and j48 classification algorithm for data classification

Tina R Patil and Swati Sunil Sherekar. Performance analysis of naive bayes and j48 classification algorithm for data classification. International journal of computer science and applications, 6(2):256– 261, 2013

work page 2013

[24] [24]

Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program)

Joelle Pineau, Philippe Vincent-Lamarre, Koustuv Sinha, Vincent Larivière, Alina Beygelzimer, Florence d’Alché Buc, Emily Fox, and Hugo Larochelle. Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). Journal of machine learning research, 22(164):1–20, 2021

work page 2019

[25] [25]

A review of feature selection techniques in bioinformatics.bioinformatics, 23(19):2507– 2517, 2007

Yvan Saeys, Inaki Inza, and Pedro Larranaga. A review of feature selection techniques in bioinformatics.bioinformatics, 23(19):2507– 2517, 2007

work page 2007

[26] [26]

Diagnosis of diabetes type-ii using hybrid machine learning based ensemble model.International Journal of Information Technology, 12(2):419–428, 2020

Abid Sarwar, Mehbob Ali, Jatinder Manhas, and Vinod Sharma. Diagnosis of diabetes type-ii using hybrid machine learning based ensemble model.International Journal of Information Technology, 12(2):419–428, 2020

work page 2020

[27] [27]

Benjamin Shickel, Patrick James Tighe, Azra Bihorac, and Parisa Rashidi. Deep ehr: a survey of recent advances in deep learning techniques for electronic health record (ehr) analysis.IEEE journal of biomedical and health informatics, 22(5):1589–1604, 2017

work page 2017

[28] [28]

M. S. Singh, K. Thongam, P. Choudhary, et al. Stroke risk prediction and prevention: Traditional versus machine learning approaches. Archives of Computational Methods in Engineering, 2025. doi: 10.1007/s11831-025-10406-5

work page doi:10.1007/s11831-025-10406-5 2025

[29] [29]

Auto-weka: Combined selection and hyperparameter opti- mization of classification algorithms

Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton- Brown. Auto-weka: Combined selection and hyperparameter opti- mization of classification algorithms. InProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 847–855, 2013

work page 2013

[30] [30]

Twomodificationsofcnn.IEEETransactionsonSystems, Man, and Cybernetics, SMC-6(11):769–772, 1976

IvanTomek. Twomodificationsofcnn.IEEETransactionsonSystems, Man, and Cybernetics, SMC-6(11):769–772, 1976

work page 1976

[31] [31]

mixup: Beyond empirical risk minimization

HongyiZhang,MoustaphaCisse,YannN.Dauphin,andDavidLopez- Paz. mixup: Beyond empirical risk minimization. InInternational Conference on Learning Representations (ICLR), pages 1–13, 2018

work page 2018

[32] [32]

Marc-André Zöller and Marco F. Huber. Benchmark and survey of automated machine learning frameworks.Journal of Artificial Intelligence Research, 70:409–472, 2021. R. Huang et al.:Preprint submitted to ElsevierPage 20 of 41 Log-Driven Reproducible AutoML for Healthcare Supplementary Materials Overview This document presents thecomprehensive set of experime...

work page arXiv 2021