arxiv: 2604.14514 · v1 · submitted 2026-04-16 · 💻 cs.AI · cs.CE

Recognition: unknown

Perspective on Bias in Biomedical AI: Preventing Downstream Healthcare Disparities

Michal Rosen-Zvi , Yoav Kan-Tor , Michael Danziger , Agata Ferretti , Javier Aula-Blasco , Julia Falcao , Ron Shamir , Mordechai Muszkat

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:47 UTC · model grok-4.3

classification 💻 cs.AI cs.CE

keywords biomedical AIomics data biasfoundation modelshealthcare disparitiesancestry reportingdata provenanceAI equitypopulation bias

0 comments

The pith

Biases introduced during omics data collection get locked into biomedical foundation models and produce downstream healthcare inequities that later rules cannot fix.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that most omics studies omit ancestry or ethnicity details, and the large public datasets used to pretrain models are overwhelmingly European in origin. Because foundation models are pretrained once on these collections and then reused across many tasks, any early skew in population representation spreads automatically to clinical tools and diagnostic aids. The authors argue this creates a form of bias that regulatory checks applied at the point of clinical deployment cannot undo. They therefore advocate shifting attention to three upstream practices: tracking data provenance, requiring demographic openness, and demanding transparent performance evaluation across groups.

Core claim

As biomedical foundation models become central to discovery through repeated reuse of models pretrained on large omics collections, the documented underreporting of ancestry and strong European dominance in those collections will be perpetuated and amplified, producing performance gaps and health inequities for non-European populations that regulatory interventions at later stages cannot fully reverse.

What carries the argument

The pretraining-and-reuse paradigm for foundation models, which transfers population skews present in source omics datasets into every downstream application.

If this is right

Regulatory interventions applied only at clinical deployment will leave early-stage data biases intact.
Community adoption of Provenance, Openness, and Evaluation Transparency practices would reduce the risk of irreversible inequities.
Biomedical AI tools will serve underserved populations more effectively once demographic composition of training data is routinely disclosed and evaluated.
Repeated reuse of the same biased base models across tasks will compound rather than dilute the initial population skew.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future work could test whether adding even modest amounts of non-European omics data at the pretraining stage measurably improves equity metrics without harming overall accuracy.
The same logic may apply to other data modalities such as imaging or electronic health records that feed into shared foundation models.
Funding agencies could require ancestry reporting as a condition for dataset deposition to change collection incentives upstream.

Load-bearing premise

That the observed dominance of European-ancestry samples in omics datasets will produce measurable differences in model accuracy or clinical outcomes for other ancestry groups.

What would settle it

A controlled experiment that trains two otherwise identical foundation models, one on current European-heavy omics data and one on a version balanced across ancestries, then measures no difference in downstream task performance or fairness metrics on held-out non-European cohorts.

read the original abstract

Healthcare disparities persist across socioeconomic boundaries, often attributed to unequal access to screening, diagnostics, and therapeutics. However, this perspective highlights that critical biases can emerge much earlier, during data collection and research prioritization, long before clinical implementation in cases where the focus of the studies and the data that is collected is at the molecular level. A vast number of studies focus on collecting omics data but the demographic information associated with these datasets is often not reported in the studies, and when it is reported, it shows big biases. An automated analysis of 4719 PubMed-indexed omics publications from 2015 to 2024 reveals that only a small fraction report ancestry or ethnicity information, with ancestry reporting improving slightly. Analysis of large-scale datasets commonly used for model training, such as CellxGene and GEO, reveals substantial population bias where European-ancestry data dominates. As biomedical foundation models become central to biomedical discovery with a paradigm in which base models are pretrained on large datasets and reusing them time and again for many different downstream tasks, they risk perpetuating or amplifying these early-stage biases, leading to cascading inequities that regulatory interventions cannot fully reverse. We propose a community-wide focus on three foundational principles: Provenance, Openness, and Evaluation Transparency to improve equity and robustness in biomedical AI. This approach aims to foster biomedical innovation that more effectively serves underserved populations and improves health outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This perspective gives concrete counts on ancestry underreporting in omics papers and datasets but assumes rather than demonstrates that the bias creates irreversible downstream healthcare disparities.

read the letter

This perspective paper's main value is the automated scan of 4719 PubMed-indexed omics publications from 2015 to 2024. It shows that only a small fraction report ancestry or ethnicity, with a slight improvement over the years. The checks on CellxGene and GEO add numbers confirming European-ancestry dominance in the data commonly used for training. Those counts turn a familiar concern into something citable and current.

Referee Report

2 major / 2 minor

Summary. The manuscript is a perspective arguing that biases arise early in biomedical research during omics data collection and prioritization. An automated analysis of 4719 PubMed-indexed omics publications (2015-2024) finds low rates of ancestry/ethnicity reporting (with modest improvement over time), while inspection of CellxGene and GEO datasets shows strong European-ancestry dominance. The authors contend that, under the foundation-model paradigm of large-scale pretraining followed by repeated downstream reuse, these early biases will be perpetuated or amplified, producing cascading healthcare disparities that later regulatory interventions cannot fully reverse. They advocate three community principles—Provenance, Openness, and Evaluation Transparency—to improve equity and robustness.

Significance. If the causal pathway from dataset demographics to irreversible downstream disparities is substantiated, the perspective would usefully direct attention to upstream data practices in biomedical AI. The concrete counts from the 4719-publication corpus and the two large public repositories supply a tangible empirical anchor for the bias observation, which is a clear strength. The proposed principles offer a practical, non-regulatory framing that could influence data-sharing norms and model documentation standards.

major comments (2)

[Abstract] Abstract and the paragraph introducing the foundation-model paradigm: the central claim that pretraining on ancestry-biased omics data will produce 'cascading inequities that regulatory interventions cannot fully reverse' is asserted without direct empirical support or simulation inside the manuscript. The 4719-publication counts and CellxGene/GEO inspections establish the existence of reporting gaps and population imbalance, but no biomedical-specific evidence, ablation, or outcome-linked analysis demonstrates that these translate into ancestry-linked performance gaps in foundation models or into health inequities immune to later mitigation.
[Foundation-model risk discussion] Section discussing risks to downstream tasks: the mechanism by which European dominance in pretraining corpora is expected to propagate into measurable disparities for non-European populations in clinical AI applications is described at a high level but not instantiated with any concrete example, performance metric, or reference to a controlled study within the paper, leaving the load-bearing causal step untested.

minor comments (2)

[Automated analysis of publications] The automated-analysis subsection would benefit from explicit reporting of the PubMed query string, the exact criteria or classifier used to flag ancestry mentions, and any validation steps (e.g., manual review of a sample), which are necessary for reproducibility of the 4719-paper statistics.
[Conclusion] The manuscript would be strengthened by a brief discussion of how the three proposed principles (Provenance, Openness, Evaluation Transparency) could be operationalized in existing data repositories or model cards, moving from high-level recommendation to actionable guidance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and have revised the manuscript to qualify our claims more carefully, add supporting literature citations, and expand the discussion of mechanisms while preserving the perspective's focus on upstream data practices.

read point-by-point responses

Referee: [Abstract] Abstract and the paragraph introducing the foundation-model paradigm: the central claim that pretraining on ancestry-biased omics data will produce 'cascading inequities that regulatory interventions cannot fully reverse' is asserted without direct empirical support or simulation inside the manuscript. The 4719-publication counts and CellxGene/GEO inspections establish the existence of reporting gaps and population imbalance, but no biomedical-specific evidence, ablation, or outcome-linked analysis demonstrates that these translate into ancestry-linked performance gaps in foundation models or into health inequities immune to later mitigation.

Authors: We agree that the manuscript, as a perspective, does not include original empirical simulations, ablations, or outcome-linked analyses demonstrating the full causal translation from biased pretraining data to irreversible downstream disparities. Our contribution centers on documenting the upstream imbalances via the PubMed corpus analysis and repository inspections, then linking these to the foundation-model reuse paradigm. In revision, we have softened the abstract and introduction to describe a 'risk of perpetuating or amplifying biases, potentially leading to cascading inequities that may prove difficult to fully reverse through later interventions alone.' We have also added citations to studies documenting ancestry-linked performance gaps in genomic and single-cell AI models to provide indirect support for the mechanism. revision: yes
Referee: [Foundation-model risk discussion] Section discussing risks to downstream tasks: the mechanism by which European dominance in pretraining corpora is expected to propagate into measurable disparities for non-European populations in clinical AI applications is described at a high level but not instantiated with any concrete example, performance metric, or reference to a controlled study within the paper, leaving the load-bearing causal step untested.

Authors: We acknowledge that the original discussion of propagation remained conceptual. The revised manuscript expands this section with concrete examples and references drawn from the literature, including documented reductions in accuracy for polygenic risk scores and variant interpretation models when applied to non-European ancestry groups after European-dominant pretraining, as well as ancestry biases observed in cell-type annotation from single-cell omics foundation models. These additions instantiate the mechanism with specific performance considerations while clarifying that the degree of irreversibility depends on the feasibility of downstream mitigation. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on external dataset analysis and logical inference, not self-referential derivations.

full rationale

The paper performs an automated count of ancestry reporting in 4719 PubMed omics papers (2015-2024) and inspects demographic composition in CellxGene and GEO. It then reasons that foundation-model pretraining on such data may perpetuate biases into downstream tasks. This is observational reporting plus perspective, with no equations, fitted parameters, self-defined terms, or load-bearing self-citations that reduce the central claim to its own inputs by construction. The causal extrapolation to irreversible inequities is an interpretive step, not a mathematical reduction. No steps match the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central argument depends on the assumption that data biases propagate through foundation models and that regulation cannot reverse them; no free parameters or new entities are introduced.

axioms (2)

domain assumption Biases present at data collection will be perpetuated or amplified when models are pretrained on large omics datasets and reused for downstream tasks.
Invoked to justify the risk of cascading inequities.
ad hoc to paper Regulatory interventions cannot fully reverse early-stage biases once embedded in foundation models.
Stated directly as a premise for why upstream focus is needed.

pith-pipeline@v0.9.0 · 5579 in / 1251 out tokens · 30610 ms · 2026-05-10T11:47:33.679749+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

59 extracted references · 11 canonical work pages

[1]

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Josh Abramson et al. “Accurate structure prediction of biomolecular interactions with AlphaFold 3”. In:Nature630.8016 (2024), pp. 493– 500

2024
[2]

Inferring Genetic Ancestry From Cancer Sequencing Data

Kanika Arora and Michael F. Berger. “Inferring Genetic Ancestry From Cancer Sequencing Data”. In:Trends In Genetics39.6 (June 2023), pp. 431–432.issn: 0168-9525.doi: 10.1016/j.tig.2023.03.003 . url:http://dx.doi.org/10.1016/j.tig.2023.03.003

work page doi:10.1016/j.tig.2023.03.003 2023
[3]

Ethnic Diversity And Warfarin Pharmacogenomics

Innocent G Asiimwe and Munir Pirmohamed. “Ethnic Diversity And Warfarin Pharmacogenomics”. In:Frontiers In Pharmacology13 (2022), p. 866058. 11

2022
[4]

On The Dangers Of Stochastic Parrots: Can Language Models Be Too Big?

Emily M Bender et al. “On The Dangers Of Stochastic Parrots: Can Language Models Be Too Big?” In:Proceedings Of The 2021 Acm Con- ference On Fairness, Accountability, And Transparency. 2021, pp. 610– 623

2021
[5]

How To Build The Virtual Cell With Artificial Intelligence: Priorities And Opportunities

Charlotte Bunne et al. “How To Build The Virtual Cell With Artificial Intelligence: Priorities And Opportunities”. In:Cell187.25 (2024), pp. 7045–7063

2024
[6]

Quantitative Trait Locus (xqtl) Approaches Iden- tify Risk Genes And Drug Targets From Human Non-coding Genomes

Marina Bykova et al. “Quantitative Trait Locus (xqtl) Approaches Iden- tify Risk Genes And Drug Targets From Human Non-coding Genomes”. In:Human Molecular Genetics31.R1 (Aug. 2022), R105–R113.issn: 1460-2083.doi: 10.1093/hmg/ddac208 .url: http://dx.doi.org/ 10.1093/hmg/ddac208

work page doi:10.1093/hmg/ddac208 2022
[7]

Lessons learned: recommendations for establishing critical periodic scientific benchmarking

Salvador Capella-Gutierrez et al. “Lessons learned: recommendations for establishing critical periodic scientific benchmarking”. In:BioRxiv (2017), p. 181677

2017
[8]

Target 2035: Probing The Human Proteome

Adrian J Carter et al. “Target 2035: Probing The Human Proteome”. In:Drug Discovery Today24.11 (2019), pp. 2111–2115

2035
[9]

Multi-ancestry Transcriptome-wide Association Anal- yses Yield Insights Into Tobacco Use Biology And Drug Repurposing

Fang Chen et al. “Multi-ancestry Transcriptome-wide Association Anal- yses Yield Insights Into Tobacco Use Biology And Drug Repurposing”. In:Nature Genetics55.2 (Jan. 2023), pp. 291–300.issn: 1546-1718. doi: 10.1038/s41588-022-01282-x .url: http://dx.doi.org/10. 1038/s41588-022-01282-x

work page doi:10.1038/s41588-022-01282-x 2023
[10]

The gene expression omnibus database

Emily Clough and Tanya Barrett. “The gene expression omnibus database”. In:Statistical Genomics: Methods and Protocols. Springer, 2016, pp. 93–110

2016
[11]

The Tabula Sapiens: A Multiple- organ,Single-cellTranscriptomicAtlasOfHumans

The Tabula Sapiens Consortium et al. “The Tabula Sapiens: A Multiple- organ,Single-cellTranscriptomicAtlasOfHumans”.In:Science376.6594 (2022), eabl4896

2022
[12]

Scgpt: Toward Building A Foundation Model For Single-cell Multi-omics Using Generative Ai

Haotian Cui et al. “Scgpt: Toward Building A Foundation Model For Single-cell Multi-omics Using Generative Ai”. In:Nature Methods (2024), pp. 1–11

2024
[13]

Towards Multimodal Foundation Models In Molec- ular Cell Biology

Haotian Cui et al. “Towards Multimodal Foundation Models In Molec- ular Cell Biology”. In:Nature640.8059 (2025), pp. 623–633

2025
[14]

Bmfm-rna: An open framework for building and evaluating transcriptomic foundation mod- els

Bharath Danziger Michael M Dandala et al. “Bmfm-rna: An open framework for building and evaluating transcriptomic foundation mod- els”. In:arXiv preprint arXiv:2506.14861(2025). 12

work page arXiv 2025
[15]

50 years of data science

David Donoho. “50 years of data science”. In:Journal of Computational and Graphical Statistics26.4 (2017), pp. 745–766

2017
[16]

Racial/ethnic Differences In Biological Aging And Their Life Course Socioeconomic Determinants: The 2016 Health And Retirement Study

Mateo P Farina, Jung Ki Kim, and Eileen M Crimmins. “Racial/ethnic Differences In Biological Aging And Their Life Course Socioeconomic Determinants: The 2016 Health And Retirement Study”. In:Journal Of Aging And Health35.3-4 (2023), pp. 209–220

2016
[17]

Diversity In Genomic Studies: A Roadmap To Address The Imbalance

Segun Fatumo et al. “Diversity In Genomic Studies: A Roadmap To Address The Imbalance”. In:Nature Medicine28.2 (2022), p. 243

2022
[18]

A Wealth Of Discovery Built On The Hu- man Genome Project—by The Numbers

Alexander J Gates et al. “A Wealth Of Discovery Built On The Hu- man Genome Project—by The Numbers”. In:Nature590.7845 (2021), pp. 212–215

2021
[19]

Why Batch Effects Matter In Omics Data, And How To Avoid Them

Wilson Wen Bin Goh, Wei Wang, and Limsoon Wong. “Why Batch Effects Matter In Omics Data, And How To Avoid Them”. In:Trends In Biotechnology35.6 (2017), pp. 498–507

2017
[20]

Assessing DEI Bias in Gene Expression Omnibus (GEO) Datasets Based on Gender and Ethnicity

Mahnoor N Gondal. “Assessing DEI Bias in Gene Expression Omnibus (GEO) Datasets Based on Gender and Ethnicity”. In:BioRxiv(2024), pp. 2024–11

2024
[21]

Techniques for learning and transferring knowledge for microbiome-based classification and prediction: review and assessment

Jin Han, Haohong Zhang, and Kang Ning. “Techniques for learning and transferring knowledge for microbiome-based classification and prediction: review and assessment”. In:Briefings in Bioinformatics 26.1 (2025), bbaf015

2025
[22]

Large-scale Foundation Model On Single-cell Transcriptomics

Minsheng Hao et al. “Large-scale Foundation Model On Single-cell Transcriptomics”. In:Nature Methods21.8 (2024), pp. 1481–1491

2024
[23]

Simulating 500 million years of evolution with a language model

Thomas Hayes et al. “Simulating 500 million years of evolution with a language model”. In:Science387.6736 (2025), pp. 850–858

2025
[24]

Dnabert: Pre-trained Bidirectional Encoder Repre- sentations From Transformers Model For Dna-language In Genome

Yanrong Ji et al. “Dnabert: Pre-trained Bidirectional Encoder Repre- sentations From Transformers Model For Dna-language In Genome”. In:Bioinformatics37.15 (2021), pp. 2112–2120

2021
[25]

Gene Expression In African Americans, Puerto Ricans And Mexican Americans Reveals Ancestry-specific Patterns Of Genetic Architecture

Linda Kachuri et al. “Gene Expression In African Americans, Puerto Ricans And Mexican Americans Reveals Ancestry-specific Patterns Of Genetic Architecture”. In:Nature Genetics55.6 (May 2023), pp. 952– 963.issn: 1546-1718.doi: 10 . 1038 / s41588 - 023 - 01377 - z.url: http://dx.doi.org/10.1038/s41588-023-01377-z

work page doi:10.1038/s41588-023-01377-z 2023
[26]

BioBERT: a pre-trained biomedical language rep- resentation model for biomedical text mining

Jinhyuk Lee et al. “BioBERT: a pre-trained biomedical language rep- resentation model for biomedical text mining”. In:Bioinformatics36.4 (2020), pp. 1234–1240. 13

2020
[27]

CpGPT: A Foundation Model for DNA Methylation

Lucas Paulo et al. de Lima Camillo. “CpGPT: A Foundation Model for DNA Methylation”. In:bioRxiv(2024).doi: 10.1101/2024.10.24. 619766.url: https://www.biorxiv.org/content/10.1101/2024. 10.24.619766v1

work page doi:10.1101/2024.10.24 2024
[28]

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

Bang Liu et al. “Advances And Challenges In Foundation Agents: From Brain-inspired Intelligence To Evolutionary, Collaborative, And Safe Systems”. In:Arxiv Preprint Arxiv:2504.01990(2025)

work page Pith review arXiv 2025
[29]

Large Language Models and Causal Inference in Col- laboration: A Comprehensive Survey

Xiaoyu Liu et al. “Large Language Models and Causal Inference in Col- laboration: A Comprehensive Survey”. In:Findings of the Association for Computational Linguistics: NAACL 2025. Ed. by Luis Chiruzzo, Alan Ritter, and Lu Wang. Albuquerque, New Mexico: Association for Computational Linguistics, Apr. 2025, pp. 7668–7684.isbn: 979- 8-89176-195-7.url: https...

2025
[30]

Defining and benchmarking open problems in single-cell analysis

Malte D Luecken et al. “Defining and benchmarking open problems in single-cell analysis”. In:Nature Biotechnology(2025), pp. 1–6

2025
[31]

Biogpt: Generative Pre-trained Transformer For Biomedical Text Generation And Mining

Renqian Luo et al. “Biogpt: Generative Pre-trained Transformer For Biomedical Text Generation And Mining”. In:Briefings In Bioinfor- matics23.6 (2022), bbac409

2022
[32]

A Historical Perspective Of Biomedical Ex- plainable Ai Research

Luca Malinverno et al. “A Historical Perspective Of Biomedical Ex- plainable Ai Research”. In:Patterns4.9 (2023)

2023
[33]

Clinical use of current polygenic risk scores may exacerbate health disparities

Alicia R Martin et al. “Clinical use of current polygenic risk scores may exacerbate health disparities”. In:Nature genetics51.4 (2019), pp. 584–591

2019
[34]

Socioeconomic Status And Access To Healthcare: Interrelated Drivers For Healthy Aging

Darcy Jones McMaughan, Oloruntoba Oluyomi, and Smith Lee Smith. Socioeconomic Status And Access To Healthcare: Interrelated Drivers For Healthy Aging. Front Public Health. 2020; 8: 231

2020
[35]

Vishwali Mhasawade et al.Understanding Disparities in Post Hoc Machine Learning Explanation. 2024. arXiv: 2401 . 14539 [cs.LG]. url:https://arxiv.org/abs/2401.14539

work page arXiv 2024
[36]

Foundation models for generalist medical artificial intelligence

Michael Moor et al. “Foundation models for generalist medical artificial intelligence”. In:Nature616.7956 (2023), pp. 259–265

2023
[37]

Explainable artificial intelligence (xai): From inherent explainability to large language models.arXiv preprint arXiv:2501.09967,

Fuseini Mumuni and Alhassan Mumuni. “Explainable Artificial Intelli- gence (xai): From Inherent Explainability To Large Language Models”. In:Arxiv Preprint Arxiv:2501.09967(2025). 14

work page arXiv 2025
[38]

Identify- ing biases and their potential solutions in human microbiome studies

Jacob T Nearing, André M Comeau, and Morgan GI Langille. “Identify- ing biases and their potential solutions in human microbiome studies”. In:Microbiome9.1 (2021), p. 113

2021
[39]

Sequence modeling and design from molecular to genome scale with Evo

Eric Nguyen et al. “Sequence modeling and design from molecular to genome scale with Evo”. In:Science386.6723 (2024), eado9336

2024
[40]

Dissecting racial bias in an algorithm used to manage the health of populations

Ziad Obermeyer et al. “Dissecting racial bias in an algorithm used to manage the health of populations”. In:Science366.6464 (2019), pp. 447–453

2019
[41]

Evaluating and addressing demographic dis- parities in medical large language models: a systematic review

Mahmud Omar et al. “Evaluating and addressing demographic dis- parities in medical large language models: a systematic review”. In: International Journal for Equity in Health24.1 (2025), p. 57

2025
[42]

Extensive unexplored human microbiome di- versity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle

Edoardo Pasolli et al. “Extensive unexplored human microbiome di- versity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle”. In:Cell176.3 (2019), pp. 649–662

2019
[43]

Cz Cellxgene Discover: A Single-cell Data Platform For Scalable Exploration, Analysis And Modeling Of Aggregated Data

CZI Cell Science Program et al. “Cz Cellxgene Discover: A Single-cell Data Platform For Scalable Exploration, Analysis And Modeling Of Aggregated Data”. In:Nucleic Acids Research53.D1 (2025), pp. D886– D900

2025
[44]

Analysis Of Pharma R&d Productivity– a New Perspective Needed

Alexander Schuhmacher et al. “Analysis Of Pharma R&d Productivity– a New Perspective Needed”. In:Drug Discovery Today28.10 (2023), p. 103726

2023
[45]

Mammal–molecular Aligned Multi-modal Archi- tecture And Language

Yoel Shoshan et al. “Mammal–molecular Aligned Multi-modal Archi- tecture And Language”. In:Arxiv Preprint Arxiv:2410.22367(2024)

work page arXiv 2024
[46]

The Missing Diversity In Human Genetic Studies

Giorgio Sirugo, Scott M Williams, and Sarah A Tishkoff. “The Missing Diversity In Human Genetic Studies”. In:Cell177.1 (2019), pp. 26–31

2019
[47]

Meta-analysis of (single-cell method) bench- marks reveals the need for extensibility and interoperability

Anthony Sonrel et al. “Meta-analysis of (single-cell method) bench- marks reveals the need for extensibility and interoperability”. In: Genome Biology24.1 (2023), p. 119

2023
[48]

Socioeconomic Status And The 25× 25 Risk Factors As Determinants Of Premature Mortality: A Multicohort Study And Meta-analysis Of 1·7 million men and women

Silvia Stringhini et al. “Socioeconomic Status And The 25× 25 Risk Factors As Determinants Of Premature Mortality: A Multicohort Study And Meta-analysis Of 1·7 million men and women”. In:The Lancet 389.10075 (2017), pp. 1229–1237

2017
[49]

Drug development for neglected diseases: a deficient market and a public-health policy failure

Patrice Trouiller et al. “Drug development for neglected diseases: a deficient market and a public-health policy failure”. In:The Lancet 359.9324 (2002), pp. 2188–2194. 15

2002
[50]

Consolidated Standards Of Reporting Trials (con- sort) And The Completeness Of Reporting Of Randomised Controlled Trials (rcts) Published In Medical Journals

Lucy Turner et al. “Consolidated Standards Of Reporting Trials (con- sort) And The Completeness Of Reporting Of Randomised Controlled Trials (rcts) Published In Medical Journals”. In:Cochrane Database Of Systematic Reviews11 (2012)

2012
[51]

Applications Of Single-cell Rna Sequenc- ing In Drug Discovery And Development

Bram Van de Sande et al. “Applications Of Single-cell Rna Sequenc- ing In Drug Discovery And Development”. In:Nature Reviews Drug Discovery22.6 (2023), pp. 496–520

2023
[52]

Scbert As A Large-scale Pretrained Deep Language Model For Cell Type Annotation Of Single-cell Rna-seq Data

Fan Yang et al. “Scbert As A Large-scale Pretrained Deep Language Model For Cell Type Annotation Of Single-cell Rna-seq Data”. In: Nature Machine Intelligence4.10 (2022), pp. 852–866

2022
[53]

MethylGPT: a foundation model for the DNA methylome

Kejun Ying et al. “MethylGPT: a foundation model for the DNA methylome”. In:bioRxiv(2024)

2024
[54]

Towards Causal Foundation Model: On Duality Be- tween Optimal Balancing And Attention

Jiaqi Zhang et al. “Towards Causal Foundation Model: On Duality Be- tween Optimal Balancing And Attention”. In:Forty-first International Conference On Machine Learning. 2024

2024
[55]

Population-based Discovery And Mendelian Randomization Analysis Identify Telmisartan As A Candidate Medicine For Alzheimer’s Disease In African Americans

Pengyue Zhang et al. “Population-based Discovery And Mendelian Randomization Analysis Identify Telmisartan As A Candidate Medicine For Alzheimer’s Disease In African Americans”. In:Alzheimer’s and Dementia19.5 (Nov. 2022), pp. 1876–1887.issn: 1552-5279.doi:10. 1002/alz.12819.url:http://dx.doi.org/10.1002/alz.12819

work page doi:10.1002/alz.12819 2022
[56]

Scientific Large Language Models: A Survey On Biological & Chemical Domains

Qiang Zhang et al. “Scientific Large Language Models: A Survey On Biological & Chemical Domains”. In:Acm Computing Surveys57.6 (2025), pp. 1–38

2025
[57]

Learning From Models Beyond Fine-tuning

Hongling Zheng et al. “Learning From Models Beyond Fine-tuning”. In:Nature Machine Intelligence(2025), pp. 1–12

2025
[58]

Streamline automated biomedical discoveries with agentic bioinformatics

Juexiao Zhou et al. “Streamline automated biomedical discoveries with agentic bioinformatics”. In:Briefings in Bioinformatics26.5 (2025), bbaf505

2025
[59]

The rise of agentic AI teammates in medicine

James Zou and Eric J Topol. “The rise of agentic AI teammates in medicine”. In:The Lancet405.10477 (2025), p. 457. A Demographic Analysis Details A.0.1 Study Design and Data Source To quantify demographic reporting practices in omics research, we conducted a systematic analysis of PubMed abstracts published between January 2015 16 andDecember2024. Wequeri...

2025