arxiv: 2605.08963 · v1 · submitted 2026-05-09 · 📊 stat.ML · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Survey-aware Machine Learning: A Guideline for Valid Population Health Inference based on Scoping Review

Alex A. T. Bui, Henry W. Zheng, Jeffrey Feng, YongKyung Oh

Pith reviewed 2026-05-12 01:47 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords survey datamachine learningpopulation inferencesampling weightshealth surveysbiasfairnessguidelines

0 comments

The pith

A nine-step guideline integrates survey design into machine learning to produce valid population health inferences from data like NHANES.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Machine learning models trained on complex health surveys routinely ignore primary sampling units, stratification, and sampling weights. This practice violates independence assumptions and produces biased estimates, understated uncertainty, and fairness assessments that do not reflect population disparities. The paper proposes Survey-aware Machine Learning, a nine-step guideline that folds survey design metadata into every stage of the modeling pipeline. A scoping review of sixteen prior methodological papers identifies existing techniques for weighted training and design-based evaluation while noting gaps in hyperparameter tuning and deployment. The guideline supplies task-specific instructions so that different analytical goals receive the appropriate sequence of adjustments.

Core claim

The paper claims that standard machine learning workflows applied to survey data such as NHANES violate the independence assumptions underlying most training and evaluation procedures, and that a nine-step Survey-aware Machine Learning guideline remedies this by embedding primary sampling units, stratification variables, and sampling weights throughout data preparation, model fitting, validation, performance assessment, and deployment.

What carries the argument

The nine-step Survey-aware Machine Learning (SaML) guideline that places survey design metadata at every point in the machine learning lifecycle.

If this is right

Population estimates of health outcomes become representative of the full target population rather than the sampled individuals alone.
Uncertainty intervals properly reflect the complex sampling structure and avoid overconfidence.
Fairness evaluations capture true population disparities instead of sample-specific artifacts.
Task-specific variants of the guideline tell users exactly which steps apply to prediction, descriptive inference, or other objectives.
Explicit attention is directed to previously under-addressed stages such as hyperparameter tuning and model deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same checklist structure could be adapted for other data types that violate independence, such as clustered or time-series observations.
Automated software wrappers enforcing the nine steps would reduce the practical barrier for analysts working with public survey files.
Requiring SaML compliance in public health ML pipelines could improve reproducibility of published findings.
Empirical tests on surveys other than NHANES would clarify how far the guideline generalizes.

Load-bearing premise

That following the nine prescribed steps will eliminate the bias, underestimated uncertainty, and invalid fairness results caused by ignoring survey design features.

What would settle it

A head-to-head comparison on the same NHANES dataset showing that population-level estimates, confidence intervals, and fairness metrics remain materially unchanged when the nine-step guideline is followed versus when standard machine learning is used.

Figures

Figures reproduced from arXiv: 2605.08963 by Alex A. T. Bui, Henry W. Zheng, Jeffrey Feng, YongKyung Oh.

**Figure 1.** Figure 1: Sample composition vs. population estimates by age group (NHANES 2021–2023). Older adults are oversampled to enable precise subgroup estimation [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of unweighted and weighted estimates for continuous variables (Age, BMI, Systolic BP, Diastolic BP, from left to right). Error bars indicate 95% confidence intervals. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Sample composition vs. population estimates by age group (left) and race/ethnicity (right). [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: ROC curves under standard evaluation (left) and survey-weighted evaluation (right). [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

**Figure 5.** Figure 5: Precision-Recall curves under standard evaluation (left) and survey-weighted evaluation (right). [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

Machine Learning (ML) models trained on complex health surveys such as the National Health and Nutrition Examination Survey (NHANES) often ignore primary sampling units, stratification variables, and sampling weights. This practice violates the independence assumptions of standard evaluation methods. As a result, estimates become biased, uncertainty is underestimated, and fairness assessments fail to reflect population-level disparities. We propose Survey-aware Machine Learning (SaML), a nine-step guideline that incorporates survey design metadata across the ML lifecycle. Through a scoping review of 16 methodological papers, we summarize existing work on weighted model training, design-based cross-validation, and survey-adjusted performance evaluation. We also identify gaps in hyperparameter tuning and deployment. We provide task-specific guidance that clarifies which steps are required for different analytical objectives. SaML provides a checklist for valid population inference from survey data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SaML is a competent synthesis of survey methods into a nine-step checklist but provides no test that the steps actually fix bias or uncertainty issues.

read the letter

The main thing to know is that this paper turns a scoping review of 16 methodological papers into a structured nine-step guideline for applying ML to complex survey data like NHANES, but it never checks whether following those steps improves population inferences. They organize existing advice on weighted training, design-based cross-validation, and survey-adjusted evaluation into a lifecycle checklist and flag real gaps such as hyperparameter tuning under survey design constraints. The task-specific notes for different goals like prediction versus fairness analysis are a practical addition that could help readers decide what to prioritize. That synthesis work is the paper's main contribution and it does pull scattered techniques into one place without obvious circularity. The clear limitation is the total absence of any validation. There is no application of the full SaML pipeline to a dataset, no before-and-after comparison on bias or coverage, and no simulation showing better uncertainty estimates or fairness results than standard ML or simpler survey adjustments. The claim that incorporating weights, strata, and PSUs at each stage resolves the problems rests on an untested assumption that the steps are sufficient and complete. The scoping review itself may have missed some recent work, but the bigger issue is the lack of empirical grounding for the guideline as a whole. This is aimed at applied researchers in health data science who already handle survey data and want a consolidated checklist rather than a new method. Readers looking for a starting framework could find it useful for organizing their thinking. It deserves a serious referee because the literature summary is solid and the identified gaps are worth public discussion, even though the authors would need to add at least a small case study or simulation to make the central claim convincing. I would send it for review with that expectation rather than desk reject.

Referee Report

1 major / 2 minor

Summary. The manuscript conducts a scoping review of 16 methodological papers on survey-adjusted machine learning techniques and proposes Survey-aware Machine Learning (SaML), a nine-step guideline for incorporating sampling weights, strata, and primary sampling units across the ML lifecycle when analyzing complex health surveys such as NHANES. It summarizes existing approaches to weighted training, design-based cross-validation, and adjusted performance evaluation, identifies gaps in hyperparameter tuning and deployment, and offers task-specific guidance for different analytical objectives to support valid population inference.

Significance. The structured synthesis of existing survey-aware techniques into a checklist format could help standardize practices and reduce common errors in bias, uncertainty, and fairness estimation for population health applications. The scoping review consolidates dispersed methodological work, and the task-specific recommendations add practical value. However, the absence of any empirical demonstration that the full guideline improves outcomes limits its immediate contribution beyond a literature summary.

major comments (1)

[§3 (SaML Guideline)] §3 (SaML Guideline): The central claim that the nine-step SaML guideline resolves bias, underestimated uncertainty, and invalid fairness assessments when ML is applied to survey data is unsupported by evidence. The manuscript presents no simulation study, real-data case study, before/after comparison, or benchmark against standard ML or existing survey methods to show that following the steps produces the claimed improvements in inference validity.

minor comments (2)

[Scoping Review section] Scoping Review section: The methods for identifying and selecting the 16 papers (search strategy, databases, inclusion/exclusion criteria) are not described in sufficient detail to allow replication or assessment of coverage.
[Task-specific guidance] Task-specific guidance: A summary table mapping each of the nine steps to the analytical objectives (e.g., prediction vs. inference vs. fairness) would improve clarity and usability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and for identifying the need to clarify the scope and evidentiary basis of our proposed guideline. We address the major comment below.

read point-by-point responses

Referee: The central claim that the nine-step SaML guideline resolves bias, underestimated uncertainty, and invalid fairness assessments when ML is applied to survey data is unsupported by evidence. The manuscript presents no simulation study, real-data case study, before/after comparison, or benchmark against standard ML or existing survey methods to show that following the steps produces the claimed improvements in inference validity.

Authors: We agree that the manuscript does not contain new empirical validation of the complete SaML guideline. As a scoping review, the paper synthesizes findings from the 16 included methodological papers, each of which provides evidence for specific components (weighted training, design-based cross-validation, and adjusted performance metrics). The nine-step guideline consolidates these existing approaches into a unified checklist rather than introducing or empirically testing a novel method. We will revise the manuscript to explicitly state that SaML is a literature-derived best-practice framework, tone down any implication of comprehensive resolution, and add a limitations section noting the absence of a unified empirical demonstration of the full guideline. This revision will also highlight the need for future studies to benchmark SaML against standard ML pipelines. revision: partial

Circularity Check

0 steps flagged

No circularity: SaML guideline is a synthesis from external scoping review

full rationale

The paper's central contribution is a nine-step guideline synthesized via scoping review of 16 external methodological papers on survey-adjusted ML. No equations, fitted parameters, or predictions are defined in terms of the target result. The derivation chain consists of summarizing existing techniques (weighted training, design-based CV, adjusted evaluation) and identifying gaps; this does not reduce to self-definition, self-citation load-bearing, or renaming of known results by construction. The manuscript is self-contained against external benchmarks and contains no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that survey design features must be integrated at every ML stage and that the scoping review of 16 papers provides a sufficient basis for the guideline.

axioms (1)

domain assumption Complex survey designs with primary sampling units, stratification, and weights violate the independence assumptions of standard ML evaluation methods
Core premise stated directly in the abstract as the source of biased estimates and underestimated uncertainty.

invented entities (1)

Survey-aware Machine Learning (SaML) no independent evidence
purpose: Nine-step guideline for incorporating survey design metadata across the ML lifecycle
Newly proposed framework synthesized from prior work with no independent empirical validation reported.

pith-pipeline@v0.9.0 · 5451 in / 1395 out tokens · 70623 ms · 2026-05-12T01:47:09.141681+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
We propose Survey-aware Machine Learning (SaML), a nine-step guideline that incorporates survey design metadata across the ML lifecycle... weighted model training, design-based cross-validation, and survey-adjusted performance evaluation.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear
Table 5: SaML Steps by Machine Learning Task... Prediction requires S1,S2,S3,S5 etc.

Reference graph

Works this paper leans on

132 extracted references · 132 canonical work pages

[1]

ACM Comput

Gama, João and Žliobaitundefined, Indrundefined and Bifet, Albert and Pechenizkiy, Mykola and Bouchachia, Abdelhamid , title =. ACM Comput. Surv. , volume = 46, number = 4, publisher =

work page
[2]

Little, Roderick J. A. and Rubin, Donald B. , title =

work page
[3]

and Bian, Jiang and Wang, Fei , title =

Xu, Jie and Xiao, Yunyu and Wang, Wendy Hui and Ning, Yue and Shenkman, Elizabeth A. and Bian, Jiang and Wang, Fei , title =. eBioMedicine , volume = 84, publisher =

work page
[4]

and Ali, Shehzad and Buckeridge, David and others , title =

Birdi, Sharon and Rabet, Roxana and Durant, Steve and Patel, Atushi and Vosoughi, Tina and Shergill, Mahek and Costanian, Christy and Ziegler, Carolyn P. and Ali, Shehzad and Buckeridge, David and others , title =. BMC Public Health , volume = 24, number = 1, pages = 3599, year = 2024, month = dec, doi =

work page 2024
[5]

Information Fusion , volume = 99, pages = 101896, year = 2023, month = nov, doi =

Díaz-Rodríguez, Natalia and Del Ser, Javier and Coeckelbergh, Mark and López de Prado, Marcos and Herrera-Viedma, Enrique and Herrera, Francisco , title =. Information Fusion , volume = 99, pages = 101896, year = 2023, month = nov, doi =

work page 2023
[6]

and Confalonieri, Roberto and Guidotti, Riccardo and Del Ser, Javier and Díaz-Rodríguez, Natalia and Herrera, Francisco , title =

Ali, Sajid and Abuhmed, Tamer and El-Sappagh, Shaker and Muhammad, Khan and Alonso-Moral, Jose M. and Confalonieri, Roberto and Guidotti, Riccardo and Del Ser, Javier and Díaz-Rodríguez, Natalia and Herrera, Francisco , title =. Information Fusion , volume = 99, pages = 101805, year = 2023, month = nov, doi =

work page 2023
[7]

ACM Comput

Mehrabi, Ninareh and Morstatter, Fred and Saxena, Nripsuta and Lerman, Kristina and Galstyan, Aram , title =. ACM Comput. Surv. , volume = 54, number = 6, pages =

work page
[8]

Journal of Biomedical Informatics , volume = 154, pages = 104646, year = 2024, month = jun, doi =

Yang, Yifan and Lin, Mingquan and Zhao, Han and Peng, Yifan and Huang, Furong and Lu, Zhiyong , title =. Journal of Biomedical Informatics , volume = 154, pages = 104646, year = 2024, month = jun, doi =

work page 2024
[9]

and Molsberry, Samantha A

MacNell, Nathaniel and Feinstein, Lydia and Wilkerson, Jesse and Salo, Paivi M. and Molsberry, Samantha A. and Fessler, Michael B. and Thorne, Peter S. and Motsinger-Reif, Alison A. and Zeldin, Darryl C. , title =. PLOS ONE , volume = 18, number = 1, pages =. doi:10.1371/journal.pone.0280387 , editor =

work page doi:10.1371/journal.pone.0280387
[10]

Samio and Islam, Md

Dey, Devjit and Haque, Md. Samio and Islam, Md. Mojahedul and Aishi, Umme Iffat and Shammy, Sajida Sultana and Mayen, Md. Sabbir Ahmed and Noor, Syed Toukir Ahmed and Uddin, Md. Jamal , title =. BMC Medical Research Methodology , volume = 25, number = 1, pages = 15, year = 2025, month = jan, doi =

work page 2025
[11]

Journal of Statistical Software , volume = 9, number = 8, year = 2004, doi =

Lumley, Thomas , title =. Journal of Statistical Software , volume = 9, number = 8, year = 2004, doi =

work page 2004
[12]

Proceedings of the 2019

Holstein, Kenneth and Vaughan, Jennifer Wortman and Daumé, Hal and Dudik, Miro and Wallach, Hanna , title =. Proceedings of the 2019

work page 2019
[13]

Proceedings of the 1st

Buolamwini, Joy and Gebru, Timnit , title =. Proceedings of the 1st

work page
[14]

Proceedings of the 35th

Kallus, Nathan and Zhou, Angela , title =. Proceedings of the 35th

work page
[15]

Addiction , volume = 111, number = 7, pages =

Stockwell, Tim and Zhao, Jinhui and Greenfield, Thomas and Li, Jessica and Livingston, Michael and Meng, Yang , title =. Addiction , volume = 111, number = 7, pages =. doi:10.1111/add.13373 , language =

work page doi:10.1111/add.13373
[16]

Stoop, Ineke A. L. and Billiet, Jaak and Koch, Achim and Fitzgerald, Rory , title =

work page
[17]

PLOS ONE , volume = 19, number = 6, pages =

Ahn, Hyeong Jun and Ishikawa, Kyle and Kim, Min-Hee , title =. PLOS ONE , volume = 19, number = 6, pages =. doi:10.1371/journal.pone.0304785 , editor =

work page doi:10.1371/journal.pone.0304785
[18]

International Statistical Review / Revue Internationale de Statistique , volume = 61, number = 2, pages = 317, year = 1993, month = aug, doi =

Pfeffermann, Danny , title =. International Statistical Review / Revue Internationale de Statistique , volume = 61, number = 2, pages = 317, year = 1993, month = aug, doi =

work page 1993
[19]

and Graubard, Barry I

Korn, Edward L. and Graubard, Barry I. , title =

work page
[20]

Canadian Journal of Forest Research , volume = 28, number = 10, pages =

Gregoire, T G , title =. Canadian Journal of Forest Research , volume = 28, number = 10, pages =

work page
[21]

International Statistical Review , volume = 87, number =

Skinner, Chris , title =. International Statistical Review , volume = 87, number =. doi:10.1111/insr.12285 , language =

work page doi:10.1111/insr.12285
[22]

Journal of Systems and Software , volume = 231, pages = 112612, year = 2026, month = jan, doi =

Bucaioni, Alessio and Kazman, Rick and Pelliccione, Patrizio , title =. Journal of Systems and Software , volume = 231, pages = 112612, year = 2026, month = jan, doi =

work page 2026
[23]

Psychological Methods , year = 2025, month = oct, doi =

Tang, Dandan and Tong, Xin , title =. Psychological Methods , year = 2025, month = oct, doi =

work page 2025
[24]

Journal of Big Data , volume = 12, number = 1, pages = 61, year = 2025, month = mar, doi =

Taha, Kamal , title =. Journal of Big Data , volume = 12, number = 1, pages = 61, year = 2025, month = mar, doi =

work page 2025
[25]

, title =

Lones, Michael A. , title =. Patterns , volume = 5, number = 10, publisher =

work page
[26]

Patterns , volume = 4, number = 9, publisher =

Kapoor, Sayash and Narayanan, Arvind , title =. Patterns , volume = 4, number = 9, publisher =

work page
[27]

Nature Reviews Physics , volume = 4, number = 12, pages =

Krenn, Mario and Pollice, Robert and Guo, Si Yue and Aldeghi, Matteo and Cervera-Lierta, Alba and Friederich, Pascal and dos Passos Gomes, Gabriel and Häse, Florian and Jinich, Adrian and Nigam, AkshatKumar and others , title =. Nature Reviews Physics , volume = 4, number = 12, pages =

work page
[28]

Nature , volume = 620, number = 7972, pages =

Wang, Hanchen and Fu, Tianfan and Du, Yuanqi and Gao, Wenhao and Huang, Kexin and Liu, Ziming and Chandak, Payal and Liu, Shengchao and Van Katwyk, Peter and Deac, Andreea and others , title =. Nature , volume = 620, number = 7972, pages =

work page
[29]

and Sakshaug, Joseph W

West, Brady T. and Sakshaug, Joseph W. and Aurelien, Guy Alain S. , title =. PLOS ONE , volume = 11, number = 6, pages =. doi:10.1371/journal.pone.0158120 , editor =

work page doi:10.1371/journal.pone.0158120
[30]

and Kreuter, Frauke , title =

Valliant, Richard and Dever, Jill A. and Kreuter, Frauke , title =

work page
[31]

Advances in

Ding, Frances and Hardt, Moritz and Miller, John and Schmidt, Ludwig , title =. Advances in

work page
[32]

and Pollard, Tom J

Johnson, Alistair E.W. and Pollard, Tom J. and Shen, Lu and Lehman, Li-wei H. and Feng, Mengling and Ghassemi, Mohammad and Moody, Benjamin and Szolovits, Peter and Anthony Celi, Leo and Mark, Roger G. , title =. Scientific Data , volume = 3, number = 1, pages = 160035, year = 2016, month = may, doi =

work page 2016
[33]

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

Dodge, Jesse and Sap, Maarten and Marasović, Ana and Agnew, William and Ilharco, Gabriel and Groeneveld, Dirk and Mitchell, Margaret and Gardner, Matt , title =. Proceedings of the 2021. doi:10.18653/v1/2021.emnlp-main.98 , editor =

work page doi:10.18653/v1/2021.emnlp-main.98 2021
[34]

doi: 10.18653/v1/W18-5446

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel , title =. Proceedings of the 2018. doi:10.18653/v1/W18-5446 , editor =

work page doi:10.18653/v1/w18-5446 2018
[35]

Advances in

Lakshminarayanan, Balaji and Pritzel, Alexander and Blundell, Charles , title =. Advances in

work page
[36]

, title =

Binder, David A. , title =. International Statistical Review / Revue Internationale de Statistique , volume = 51, number = 3, pages = 279, year = 1983, month = dec, doi =

work page 1983
[37]

and West, Brady T

Heeringa, Steven G. and West, Brady T. and Heeringa, Steve G. and Berglund, Patricia A. and Berglund, Patricia A. , title =

work page
[38]

, title =

Lohr, Sharon L. , title =. doi:10.1201/9780429298899 , edition = 3, language =

work page doi:10.1201/9780429298899
[39]

Reweighting

Van Alten, Sjoerd and Domingue, Benjamin W and Faul, Jessica and Galama, Titus and Marees, Andries T , title =. International Journal of Epidemiology , volume = 53, number = 3, pages =. doi:10.1093/ije/dyae054 , language =

work page doi:10.1093/ije/dyae054
[40]

JAMA Network Open , volume = 6, number = 1, pages =

Dahlen, Alex and Charu, Vivek , title =. JAMA Network Open , volume = 6, number = 1, pages =. doi:10.1001/jamanetworkopen.2022.49804 , language =

work page doi:10.1001/jamanetworkopen.2022.49804 2022
[41]

DIGITAL HEALTH , volume = 11, pages = 20552076251331319, publisher =

Zhang, Xingyu and Wang, Hairong and Yu, Guan and Zhang, Wenbin , title =. DIGITAL HEALTH , volume = 11, pages = 20552076251331319, publisher =. doi:10.1177/20552076251331319 , language =

work page doi:10.1177/20552076251331319
[42]

and Reinhart, Alex and Bilinski, Alyssa and Chua, Eu Jing and La Motte-Kerr, Wichada and Rönn, Minttu M

Salomon, Joshua A. and Reinhart, Alex and Bilinski, Alyssa and Chua, Eu Jing and La Motte-Kerr, Wichada and Rönn, Minttu M. and Reitsma, Marissa B. and Morris, Katherine A. and LaRocca, Sarah and Farag, Tamer H. and others , title =. Proceedings of the National Academy of Sciences , volume = 118, number = 51, pages =. doi:10.1073/pnas.2111454118 , language =

work page doi:10.1073/pnas.2111454118
[43]

and Schuler, Megan and Stuart, Elizabeth A

DuGoff, Eva H. and Schuler, Megan and Stuart, Elizabeth A. , title =. Health Services Research , volume = 49, number = 1, pages =. doi:10.1111/1475-6773.12090 , language =

work page doi:10.1111/1475-6773.12090
[44]

Nature Communications , volume = 12, number = 1, pages = 2729, year = 2021, month = may, doi =

Modi, Chirag and Böhm, Vanessa and Ferraro, Simone and Stein, George and Seljak, Uroš , title =. Nature Communications , volume = 12, number = 1, pages = 2729, year = 2021, month = may, doi =

work page 2021
[45]

, title =

García De La Garza, Ángel and Blanco, Carlos and Olfson, Mark and Wall, Melanie M. , title =. JAMA Psychiatry , volume = 78, number = 4, pages = 398, year = 2021, month = apr, doi =

work page 2021
[46]

BMJ Public Health , volume = 3, number = 1, pages =

Falasinnu, Titilola and Hossain, Md Belal and Karim, Mohammad Ehsanul and Weber, Kenneth Arnold and Mackey, Sean , title =. BMJ Public Health , volume = 3, number = 1, pages =. doi:10.1136/bmjph-2024-001628 , language =

work page doi:10.1136/bmjph-2024-001628 2024
[47]

and Souza, Pamela E

Ellis, Gregory M. and Souza, Pamela E. , title =. Frontiers in Digital Health , volume =

work page
[48]

Communications Medicine , volume = 2, number = 1, pages = 125, year = 2022, month = oct, doi =

Qiu, Wei and Chen, Hugh and Dincer, Ayse Berceste and Lundberg, Scott and Kaeberlein, Matt and Lee, Su-In , title =. Communications Medicine , volume = 2, number = 1, pages = 125, year = 2022, month = oct, doi =

work page 2022
[49]

and Kuriwaki, Shiro and Isakov, Michael and Sejdinovic, Dino and Meng, Xiao-Li and Flaxman, Seth , title =

Bradley, Valerie C. and Kuriwaki, Shiro and Isakov, Michael and Sejdinovic, Dino and Meng, Xiao-Li and Flaxman, Seth , title =. Nature , volume = 600, number = 7890, pages =

work page
[50]

The Annals of Applied Statistics , volume = 12, number = 2, pages =

Statistical paradises and paradoxes in big data (. The Annals of Applied Statistics , volume = 12, number = 2, pages =

work page
[51]

Healthcare Analytics , volume = 5, pages = 100297, year = 2024, month = jun, doi =

Chowdhury, Mohammad Mihrab and Ayon, Ragib Shahariar and Hossain, Md Sakhawat , title =. Healthcare Analytics , volume = 5, pages = 100297, year = 2024, month = jun, doi =

work page 2024
[52]

BMC Public Health , volume = 25, number = 1, pages = 319, year = 2025, month = jan, doi =

Guo, Xinghong and Ma, Mingze and Zhao, Lipei and Wu, Jian and Lin, Yan and Fei, Fengyi and Tarimo, Clifford Silver and Wang, Saiyi and Zhang, Jingyi and Cheng, Xinya and others , title =. BMC Public Health , volume = 25, number = 1, pages = 319, year = 2025, month = jan, doi =

work page 2025
[53]

BMC Medical Informatics and Decision Making , volume = 25, number = 1, pages = 105, year = 2025, month = mar, doi =

Tang, Qun and Wang, Yong and Luo, Yan , title =. BMC Medical Informatics and Decision Making , volume = 25, number = 1, pages = 105, year = 2025, month = mar, doi =

work page 2025
[54]

Preventing Chronic Disease , volume = 16, pages = 190109, year = 2019, month = sep, doi =

Xie, Zidian and Nikolayeva, Olga and Luo, Jiebo and Li, Dongmei , title =. Preventing Chronic Disease , volume = 16, pages = 190109, year = 2019, month = sep, doi =

work page 2019
[55]

Computer

Fogliato, Riccardo and Patil, Pratik and Monfort, Mathew and Perona, Pietro , title =. Computer

work page
[56]

Scientific Reports , volume = 10, number = 1, pages = 10620, year = 2020, month = jun, doi =

López-Martínez, Fernando and Núñez-Valdez, Edward Rolando and Crespo, Rubén González and García-Díaz, Vicente , title =. Scientific Reports , volume = 10, number = 1, pages = 10620, year = 2020, month = jun, doi =

work page 2020
[57]

and Huang, Samuel Y

Huang, Alexander A. and Huang, Samuel Y. , title =. PLOS ONE , volume = 19, number = 5, pages =. doi:10.1371/journal.pone.0304509 , editor =

work page doi:10.1371/journal.pone.0304509
[58]

PLOS ONE , volume = 19, number = 9, pages =

Olshvang, Daniel and Harris, Carl and Chellappa, Rama and Santhanam, Prasanna , title =. PLOS ONE , volume = 19, number = 9, pages =. doi:10.1371/journal.pone.0309830 , editor =

work page doi:10.1371/journal.pone.0309830
[59]

and Pagano, Marcello , title =

Hedt, Bethany L. and Pagano, Marcello , title =. Statistics in Medicine , volume = 30, number = 5, pages =. doi:10.1002/sim.3920 , language =

work page doi:10.1002/sim.3920
[60]

and Fuller, Wayne A

Isaki, Cary T. and Fuller, Wayne A. , title =. Journal of the American Statistical Association , volume = 77, number = 377, pages =

work page
[61]

and Korn, Edward L

Graubardand, Barry I. and Korn, Edward L. , title =. Statistical Science , volume = 17, number = 1, year = 2002, month = may, doi =

work page 2002
[62]

Best, Henning and Wolf, Christof , title =

work page
[63]

, title =

Toth, Daniell and Eltinge, John L. , title =. Journal of the American Statistical Association , volume = 106, number = 496, pages =. doi:10.1198/jasa.2011.tm10383 , language =

work page doi:10.1198/jasa.2011.tm10383 2011
[64]

Science , volume = 366, number = 6464, pages =

Obermeyer, Ziad and Powers, Brian and Vogeli, Christine and Mullainathan, Sendhil , title =. Science , volume = 366, number = 6464, pages =

work page
[65]

Nature Machine Intelligence , volume = 3, number = 8, pages =

Mhasawade, Vishwali and Zhao, Yuan and Chunara, Rumi , title =. Nature Machine Intelligence , volume = 3, number = 8, pages =

work page
[66]

and Feuerriegel, Stefan and Kesselheim, Aaron S

Vokinger, Kerstin N. and Feuerriegel, Stefan and Kesselheim, Aaron S. , title =. Communications Medicine , volume = 1, number = 1, pages = 25, year = 2021, month = aug, doi =

work page 2021
[67]

The Lancet Psychiatry , volume = 3, number = 3, pages =

Chekroud, Adam Mourad and Zotti, Ryan Joseph and Shehzad, Zarrar and Gueorguieva, Ralitza and Johnson, Marcia K and Trivedi, Madhukar H and Cannon, Tyrone D and Krystal, John Harrison and Corlett, Philip Robert , title =. The Lancet Psychiatry , volume = 3, number = 3, pages =

work page
[68]

and Paolini, Marco and Chisholm, Katharine and Kambeitz, Joseph and Haidl, Theresa and others , title =

Koutsouleris, Nikolaos and Kambeitz-Ilankovic, Lana and Ruhrmann, Stephan and Rosen, Marlene and Ruef, Anne and Dwyer, Dominic B. and Paolini, Marco and Chisholm, Katharine and Kambeitz, Joseph and Haidl, Theresa and others , title =. JAMA Psychiatry , volume = 75, number = 11, pages =

work page
[69]

The Lancet Digital Health , volume = 4, number = 11, pages =

Koutsouleris, Nikolaos and Hauser, Tobias U and Skvortsova, Vasilisa and De Choudhury, Munmun , title =. The Lancet Digital Health , volume = 4, number = 11, pages =

work page
[70]

Journal of the American Medical Informatics Association , volume = 29, number = 9, pages =

van den Goorbergh, Ruben and van Smeden, Maarten and Timmerman, Dirk and Van Calster, Ben , title =. Journal of the American Medical Informatics Association , volume = 29, number = 9, pages =

work page
[71]

Artificial Intelligence Review , volume = 57, number = 10, pages = 273, year = 2024, month = sep, doi =

Salmi, Mabrouka and Atif, Dalia and Oliva, Diego and Abraham, Ajith and Ventura, Sebastian , title =. Artificial Intelligence Review , volume = 57, number = 10, pages = 273, year = 2024, month = sep, doi =

work page 2024
[72]

and Cecil, Charlotte and Zuluaga, Maria A

Dang, Vien Ngoc and Cascarano, Anna and Mulder, Rosa H. and Cecil, Charlotte and Zuluaga, Maria A. and Hernández-González, Jerónimo and Lekadir, Karim , title =. Scientific Reports , volume = 14, number = 1, pages = 7848, year = 2024, month = apr, doi =

work page 2024
[73]

The Lancet Digital Health , volume = 7, number = 1, pages =

Cho, Peter J and Olaye, Iredia M and Shandhi, Md Mobashir Hasan and Daza, Eric J and Foschini, Luca and Dunn, Jessilyn P , title =. The Lancet Digital Health , volume = 7, number = 1, pages =

work page
[74]

, title =

Selvarajah, Sharmini and Kaur, Gurpreet and Haniff, Jamaiyah and Cheong, Kee Chee and Hiong, Tee Guat and van der Graaf, Yolanda and Bots, Michiel L. , title =. International Journal of Cardiology , volume = 176, number = 1, pages =

work page
[75]

and Yang, Hao ‘Frank’ , title =

Du, Hongru and Zhao, Yang and Zhao, Jianan and Xu, Shaochong and Lin, Xihong and Chen, Yiran and Gardner, Lauren M. and Yang, Hao ‘Frank’ , title =. Nature Computational Science , volume = 5, number = 6, pages =

work page
[76]

and Geirsson, Arnar and Krumholz, Harlan M

Mori, Makoto and Dhruva, Sanket S. and Geirsson, Arnar and Krumholz, Harlan M. , title =. npj Digital Medicine , volume = 5, number = 1, pages = 192, year = 2022, month = dec, doi =

work page 2022
[77]

and Master, Hiral and Kim, Juseong and Kouame, Aymone and Harris, Paul A

Jeong, Hayoung and Roghanizad, Ali R. and Master, Hiral and Kim, Juseong and Kouame, Aymone and Harris, Paul A. and Basford, Melissa and Marginean, Kayla and Dunn, Jessilyn , title =. npj Digital Medicine , volume = 8, number = 1, pages = 8, year = 2025, month = jan, doi =

work page 2025
[78]

Population Health Metrics , volume = 20, number = 1, pages = 22, year = 2022, month = dec, doi =

Mardon, Russell and Campione, Joanne and Nooney, Jennifer and Merrill, Lori and Johnson, Maurice and Marker, David and Jenkins, Frank and Saydah, Sharon and Rolka, Deborah and Zhang, Xuanping and others , title =. Population Health Metrics , volume = 20, number = 1, pages = 22, year = 2022, month = dec, doi =

work page 2022
[79]

Journal of Applied Statistics , volume = 50, number = 3, pages =

Dagdoug, Mehdi and Goga, Camelia and Haziza, David , title =. Journal of Applied Statistics , volume = 50, number = 3, pages =

work page
[80]

and Breidt, F

McConville, Kelly S. and Breidt, F. Jay and Lee, Thomas C. M. and Moisen, Gretchen G. , title =. Journal of Survey Statistics and Methodology , volume = 5, number = 2, pages =

work page

Showing first 80 references.