DeepBD: A Grounded Agentic Workflow for Variant Prioritization and Diagnosis of Genetic Birth Defects

Genjie Li; Haishuai Wang; Jiajun Bu; Jiajun Yu; Jielong Lu; Shiyu Li; Weiran Liao; Zeyu Chu; Zhihao Wu; Ziqi Yan

arxiv: 2606.24779 · v1 · pith:SNWX7ASBnew · submitted 2026-06-23 · 🧬 q-bio.GN · cs.AI

DeepBD: A Grounded Agentic Workflow for Variant Prioritization and Diagnosis of Genetic Birth Defects

Shiyu Li , Ziqi Yan , Zhihao Wu , Jielong Lu , Weiran Liao , Jiajun Yu , Genjie Li , Zeyu Chu

show 2 more authors

Jiajun Bu Haishuai Wang

This is my paper

Pith reviewed 2026-06-25 21:28 UTC · model grok-4.3

classification 🧬 q-bio.GN cs.AI

keywords variant prioritizationgenetic birth defectsagentic workflowexome sequencingevidence integrationdiagnostic interpretation

0 comments

The pith

A grounded agentic workflow called DeepBD ranks genetic variants for birth defects by combining rule evidence, mechanistic context and specialist modules into layered prioritization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DeepBD as a workflow that uses LLM-assisted case structuring, a pretrained evidence engine scoring variants from rules, sequences and phenotype context, plus specialist modules and a diagnostic review layer. It reports Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out benchmark from an 18,622-case cohort, beating Exomiser, DeepRare and prompted LLM baselines on top-20 candidates. A sympathetic reader would care because exome sequencing often leaves clinicians with many candidate variants under incomplete fetal phenotypes, and better ranking can reduce time to diagnosis. Ablation results indicate that rule evidence, mechanistic context and specialist refinement supply complementary signals.

Core claim

DeepBD organizes variant prioritization and diagnostic interpretation into LLM-assisted case structuring, a pretrained evidence engine that learns patient-specific scores from structured rule evidence, sequence and variant-effect representations and phenotype-conditioned biological context, specialist evidence modules for tool-based refinement, and a grounded diagnostic review layer, achieving the reported recall rates on held-out solved cases while outperforming the listed baselines.

What carries the argument

The grounded agentic workflow that separates evidence integration from tool-based refinement and LLM-assisted diagnostic review.

If this is right

Rule evidence, mechanistic context and specialist refinement provide complementary signals for variant ranking.
The workflow outperforms standalone Exomiser, DeepRare and prompted LLM reranking when applied to Exomiser-derived top-20 candidate lists.
Ablation and overlap analyses confirm that each layer contributes distinct information to the final prioritization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The separation of evidence integration, refinement and review layers may transfer to variant interpretation tasks in other Mendelian disorders.
Adding direct functional assay data or real-time clinical notes could further improve the evidence engine scores.
Clinical adoption would require prospective testing on diverse populations to check whether the internal benchmark performance holds outside the development cohort.

Load-bearing premise

The internal held-out benchmark from the authors' 18,622-case cohort is representative of real-world clinical cases and free of selection bias or data leakage.

What would settle it

Evaluation of the same workflow on an independent external cohort of solved genetic birth defect cases in which recall at rank 1, 3, 5 or 10 does not exceed the performance of Exomiser or the other baselines.

Figures

Figures reproduced from arXiv: 2606.24779 by Genjie Li, Haishuai Wang, Jiajun Bu, Jiajun Yu, Jielong Lu, Shiyu Li, Weiran Liao, Zeyu Chu, Zhihao Wu, Ziqi Yan.

**Figure 1.** Figure 1: DeepBD workflow and cohort landscape. a, Workflow for genetic birth defect interpretation from clinical-genomic input to ranked variants and diagnosis-oriented evidence synthesis. b, Biomedical graph context linking phenotype, cellular context, pathway, gene and variant evidence. c,d, Phenotype and variant composition of the in-house fetal and infant cohort. e, Example phenotype-cell-gene-variant-pathway f… view at source ↗

**Figure 2.** Figure 2: Benchmarking DeepBD for causal variant prioritization. a, Overall Recall@1, Recall@3, Recall@5 and Recall@10 for DeepBD, standalone Exomiser, DeepRare and prompted LLM reranking baselines. DeepBD achieved 0.658, 0.882, 0.912 and 0.929, respectively. b, Recall@5 stratified by phenotype category, showing performance across major fetal and infant birth-defect presentations. weighting after candidate prepriori… view at source ↗

**Figure 3.** Figure 3: Evidence grounding, complementarity and ablation. a, Distribution of causal-variant ranks for DeepBD and baseline methods. b, Case-level evidence landscape summarizing the relative contribution of HPO, rule, mechanism, structure and mixed evidence patterns. c, Ablation analysis. Removing rule, graph-derived mechanism or structure-informed evidence decreases Recall@K, indicating that DeepBD relies on multi-… view at source ↗

read the original abstract

Birth defects are a major cause of fetal loss, neonatal morbidity and long-term disability. In the subset with suspected genetic etiologies, exome and genome sequencing have moved many cases from variant detection to post-sequencing interpretation: clinicians must rank patient-specific candidate variants under incomplete fetal or infant phenotypes and heterogeneous evidence from population genetics, variant-effect prediction, gene-disease validity, phenotype ontologies, cellular and pathway context, protein structure and clinical literature. We present DeepBD, a grounded agentic workflow for variant prioritization and diagnostic interpretation of genetic birth defects. DeepBD organizes the workflow into LLM-assisted case structuring, a pretrained evidence engine, specialist evidence modules and a grounded diagnostic review layer. The evidence engine learns patient-specific variant scores from structured rule evidence, sequence and variant-effect representations and phenotype-conditioned biological context, whereas specialist modules and the agentic layer provide tool-based refinement, candidate-pool review and diagnosis-oriented synthesis from ranked candidates. Developed using an in-house fetal and infant cohort comprising 18,622 cases, DeepBD achieved Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out solved-case benchmark, outperforming standalone Exomiser, DeepRare and prompted LLM reranking baselines evaluated on Exomiser-derived top-20 candidate variants. Ablation and overlap analyses show that rule evidence, mechanistic context, and specialist refinement provide complementary signals. These findings support a grounded agentic workflow that separates evidence integration, tool-based refinement, and LLM-assisted diagnostic review for retrospective variant prioritization in genetic birth defects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DeepBD organizes an agentic workflow for birth defect variant prioritization but reports all results on internal held-out data from its own 18k-case cohort.

read the letter

The main point is that DeepBD's reported recalls of 0.658/0.882/0.912/0.929 at ranks 1/3/5/10 come from an internal held-out benchmark on the authors' 18,622-case development cohort, with no external validation or split details provided.

The paper does lay out a clear workflow: LLM-assisted case structuring, a pretrained evidence engine that pulls in rule evidence, sequence representations, variant effects, and phenotype-conditioned context, plus specialist modules for refinement and a final grounded review layer. The ablations show that rule evidence, mechanistic signals, and specialist steps add distinct value, which is a practical observation for anyone integrating these sources.

The evaluation setup is the clear weak point. Training and testing on the same in-house cohort without reported checks for patient overlap, variant leakage, or phenotype stratification means the numbers could be inflated by data distribution effects rather than independent performance. The stress-test note on this risk is accurate based on the abstract. Baselines are also narrow, limited to Exomiser top-20 candidates.

This is for researchers working on AI tools for neonatal genetics or rare disease variant interpretation who want an example of how to structure agentic evidence integration. A reader seeking ready-to-deploy tools would need more validation first.

The paper shows straightforward thinking about the evidence types and workflow components, so it deserves peer review. Reviewers can require external cohorts and clearer split methodology.

Referee Report

3 major / 2 minor

Summary. The manuscript presents DeepBD, a grounded agentic workflow for variant prioritization and diagnostic interpretation of genetic birth defects. It combines LLM-assisted case structuring, a pretrained evidence engine that learns patient-specific variant scores from structured rule evidence, sequence/variant-effect representations and phenotype-conditioned biological context, specialist evidence modules for tool-based refinement, and a grounded diagnostic review layer. Developed on an in-house cohort of 18,622 fetal/infant cases, DeepBD reports Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out solved-case benchmark, outperforming standalone Exomiser, DeepRare and prompted LLM reranking baselines (evaluated on Exomiser top-20 candidates). Ablation and overlap analyses indicate complementary signals from rule evidence, mechanistic context and specialist refinement.

Significance. If the reported performance generalizes, the separation of evidence integration from tool-based refinement and LLM-assisted synthesis offers a structured approach to handling heterogeneous data sources in birth-defect variant interpretation. The ablations demonstrating non-redundant contributions from different modules are a constructive element of the work.

major comments (3)

[Evaluation / Results] Evaluation section (and Methods): The held-out benchmark is drawn exclusively from the same 18,622-case in-house development cohort, with no reported patient-level de-duplication, variant-overlap checks between train and test, phenotype stratification of the split, or exclusion criteria. This directly undermines the independence required for the superiority claim over baselines.
[Results / Methods] Results and Methods: All performance numbers (including the Recall@K values and comparisons to Exomiser/DeepRare/LLM baselines) are reported only on internal held-out data; no external validation cohort, no error bars or confidence intervals, and no details on the evidence-engine training procedure or hyper-parameter selection are provided.
[Abstract / Results] Abstract and Results: The central claim that DeepBD 'outperforms' the baselines rests on a benchmark whose construction risks circularity, because the evidence engine learns scores from the identical cohort distribution used for the held-out test; without independent data this does not establish generalizability.

minor comments (2)

[Methods] Clarify in the Methods whether the 'pretrained' evidence engine uses any data sources external to the 18,622-case cohort or is trained entirely in-house.
[Methods / Supplementary] Add patient-level statistics (e.g., number of unique probands, phenotype distributions) for both the development and held-out sets to allow assessment of selection bias.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for their thorough and constructive review. We address each major comment below, indicating planned revisions to strengthen the description of the evaluation methodology while remaining faithful to the work performed.

read point-by-point responses

Referee: [Evaluation / Results] Evaluation section (and Methods): The held-out benchmark is drawn exclusively from the same 18,622-case in-house development cohort, with no reported patient-level de-duplication, variant-overlap checks between train and test, phenotype stratification of the split, or exclusion criteria. This directly undermines the independence required for the superiority claim over baselines.

Authors: We agree that explicit documentation of the split construction is required. In the revised manuscript we will expand the Methods section to detail the patient-level splitting procedure, including de-duplication steps, variant-overlap checks between training and test partitions, phenotype stratification, and exclusion criteria applied to the 18,622-case cohort. These additions will clarify the degree of independence achieved. revision: yes
Referee: [Results / Methods] Results and Methods: All performance numbers (including the Recall@K values and comparisons to Exomiser/DeepRare/LLM baselines) are reported only on internal held-out data; no external validation cohort, no error bars or confidence intervals, and no details on the evidence-engine training procedure or hyper-parameter selection are provided.

Authors: We will revise the Results and Methods sections to report error bars or confidence intervals on the Recall@K metrics, and to provide a full description of the evidence-engine training procedure together with the hyper-parameter selection protocol. A new limitations subsection will explicitly state that all reported results are internal to the development cohort. revision: yes
Referee: [Abstract / Results] Abstract and Results: The central claim that DeepBD 'outperforms' the baselines rests on a benchmark whose construction risks circularity, because the evidence engine learns scores from the identical cohort distribution used for the held-out test; without independent data this does not establish generalizability.

Authors: We will revise the Abstract and Results sections to qualify all performance claims as applying to an internal held-out benchmark drawn from the same development cohort. We will also add explicit discussion of the shared distributional characteristics and the consequent limits on claims of generalizability, thereby addressing the circularity concern by narrowing the scope of the stated conclusions. revision: yes

standing simulated objections not resolved

We do not currently possess an independent external validation cohort for this specialized fetal/infant birth-defect application and therefore cannot supply external performance numbers without new data collection.

Circularity Check

1 steps flagged

Internal held-out benchmark from same 18,622-case development cohort makes reported recalls fitted quantities rather than independent predictions

specific steps

fitted input called prediction [Abstract]
"Developed using an in-house fetal and infant cohort comprising 18,622 cases, DeepBD achieved Recall@1/3/5/10 of 0.658/0.882/0.912/0.929 on an internal held-out solved-case benchmark, outperforming standalone Exomiser, DeepRare and prompted LLM reranking baselines evaluated on Exomiser-derived top-20 candidate variants."

The evidence engine learns patient-specific variant scores from the 18,622-case cohort; the quoted recalls are then measured on held-out cases from that same cohort. The performance numbers are therefore quantities obtained by fitting to the authors' data distribution and evaluating within it, rather than independent predictions on external data.

full rationale

The paper's central empirical claim is the Recall@1/3/5/10 performance on an internal held-out solved-case benchmark drawn from the identical in-house cohort used to develop the evidence engine. This reduces the reported numbers to performance on data the model was fitted to, matching the fitted_input_called_prediction pattern. No external cohort or independent validation set is described in the provided text, and the baselines are also evaluated on the same internal candidates. The workflow description (LLM structuring, specialist modules, agentic review) does not itself reduce to a self-definition or self-citation chain, so the circularity is partial and confined to the performance claim.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields minimal ledger; the central performance claim rests on the representativeness of the internal cohort and the assumption that learned scores generalize beyond the training distribution.

free parameters (1)

evidence engine model parameters
The pretrained engine learns patient-specific variant scores from the 18,622-case cohort.

axioms (1)

domain assumption The held-out split from the in-house cohort is unbiased and representative of external clinical cases
Benchmark metrics assume the internal data distribution matches real-world use.

pith-pipeline@v0.9.1-grok · 5856 in / 1401 out tokens · 38198 ms · 2026-06-25T21:28:18.319096+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 1 canonical work pages

[1]

Congenital disorders

World Health Organization. Congenital disorders. https://www.who.int/news-room/fact-sheets/detail/birth-defects (2023). Accessed 18 June 2026. 9/16

2023
[2]

Data and statistics on birth defects

Centers for Disease Control and Prevention. Data and statistics on birth defects. https://www.cdc.gov/birth-defects/ data-research/facts-stats/index.html (2026). Accessed 18 June 2026

2026
[3]

About congenital anomalies

Eunice Kennedy Shriver National Institute of Child Health and Human Development. About congenital anomalies. https://www.nichd.nih.gov/health/topics/factsheets/congenital-anomalies (2024). Accessed 18 June 2026

2024
[4]

T.et al.Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies.Am

Miller, D. T.et al.Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies.Am. J. Hum. Genet.86, 749–764 (2010)

2010
[5]

G., Leach, N

Monaghan, K. G., Leach, N. T., Pekarek, D., Prasad, P. & Rose, N. C. The use of fetal exome sequencing in prenatal diagnosis: a points to consider document of the American College of Medical Genetics and Genomics.Genet. Med.22, 675–680 (2020)

2020
[6]

V ora, N. L. & Norton, M. E. Prenatal exome and genome sequencing for fetal structural abnormalities.Am. J. Obstet. Gynecol.228, 140–149 (2023)

2023
[7]

Large-scale discovery of novel genetic causes of developmental disorders

Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature519, 223–228 (2015)

2015
[8]

F.et al.Prevalence and architecture of de novo mutations in developmental disorders.Nature542, 433–438 (2017)

McRae, J. F.et al.Prevalence and architecture of de novo mutations in developmental disorders.Nature542, 433–438 (2017)

2017
[9]

Lord, J.et al.Prenatal exome sequencing analysis in fetal structural anomalies detected by ultrasonography (PAGE): a cohort study.The Lancet393, 747–757 (2019)

2019
[10]

The Lancet393, 758–767 (2019)

Petrovski, S.et al.Whole-exome sequencing in the evaluation of fetal structural anomalies: a prospective cohort study. The Lancet393, 758–767 (2019)

2019
[11]

Fu, F.et al.Application of exome sequencing for prenatal diagnosis of fetal structural anomalies: clinical experience and lessons learned from a cohort of 1618 fetuses.Genome Med.14, 123 (2022)

2022
[12]

& Chitty, L

Mellis, R., Oprych, K., Scotchman, E., Hill, M. & Chitty, L. S. Diagnostic yield of exome sequencing for prenatal diagnosis of fetal structural anomalies: a systematic review and meta-analysis.Prenat. Diagn.42, 662–685 (2022)

2022
[13]

H.et al.Diagnostic yield of pediatric and prenatal exome sequencing in a diverse population.npj Genomic Med.8, 34 (2023)

Wojcik, M. H.et al.Diagnostic yield of pediatric and prenatal exome sequencing in a diverse population.npj Genomic Med.8, 34 (2023)

2023
[14]

N.et al.The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease.Am

Robinson, P. N.et al.The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease.Am. J. Hum. Genet.83, 610–615 (2008). 15.Köhler, S.et al.The Human Phenotype Ontology in 2021.Nucleic Acids Res.49, D1207–D1217 (2021)

2008
[15]

A.et al.The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.Nucleic Acids Res.48, D704–D715 (2020)

Shefchek, K. A.et al.The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.Nucleic Acids Res.48, D704–D715 (2020)

2019
[16]

H., Buske, O

Fujiwara, T., Yamamoto, Y ., Kim, J. H., Buske, O. J. & Takagi, T. PubCaseFinder: a case-report-based, phenotype-driven differential-diagnosis system for rare diseases.Am. J. Hum. Genet.103, 389–399 (2018)

2018
[17]

Javed, A., Agrawal, S. & Ng, P. C. Phen-gen: combining phenotype and genotype to analyze rare disorders.Nat. Methods 11, 935–937 (2014)

2014
[18]

Protoc.10, 2004–2015 (2015)

Smedley, D.et al.Next-generation diagnostics and disease-gene discovery with the Exomiser.Nat. Protoc.10, 2004–2015 (2015)

2004
[19]

D., Ma, X

Li, Q., Zhao, K., Bustamante, C. D., Ma, X. & Wong, W. H. Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis.Genet. Med.21, 2126–2134 (2019)

2019
[20]

N.et al.Interpretable clinical genomics with a likelihood ratio paradigm.Am

Robinson, P. N.et al.Interpretable clinical genomics with a likelihood ratio paradigm.Am. J. Hum. Genet.107, 403–417 (2020)

2020
[21]

Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function.Nucleic Acids Res.31, 3812–3814 (2003)

2003
[22]

A.et al.A method and server for predicting damaging missense mutations.Nat

Adzhubei, I. A.et al.A method and server for predicting damaging missense mutations.Nat. Methods7, 248–249 (2010)

2010
[23]

Genet.46, 310–315 (2014)

Kircher, M.et al.A general framework for estimating the relative pathogenicity of human genetic variants.Nat. Genet.46, 310–315 (2014)

2014
[24]

M., Rödelsperger, C., Schuelke, M

Schwarz, J. M., Rödelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations.Nat. Methods7, 575–576 (2010). 10/16

2010
[25]

M.et al.REVEL: an ensemble method for predicting the pathogenicity of rare missense variants.Am

Ioannidis, N. M.et al.REVEL: an ensemble method for predicting the pathogenicity of rare missense variants.Am. J. Hum. Genet.99, 877–885 (2016)

2016
[26]

Genet.50, 1161–1170 (2018)

Sundaram, L.et al.Predicting the clinical impact of human mutation with deep neural networks.Nat. Genet.50, 1161–1170 (2018). 28.Jaganathan, K.et al.Predicting splicing from primary sequence with deep learning.Cell176, 535–548.e24 (2019)

2018
[27]

Cheng, J.et al.Accurate proteome-wide missense variant effect prediction with AlphaMissense.Science381, eadg7492 (2023)

2023
[28]

Med.17, 405–424 (2015)

Richards, S.et al.Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the ACMG and the AMP.Genet. Med.17, 405–424 (2015)

2015
[29]

T.et al.Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the Clinical Genome Resource.Genet

Strande, N. T.et al.Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the Clinical Genome Resource.Genet. Med.19, 896–906 (2017)

2017
[30]

J.et al.ClinVar: public archive of interpretations of clinically relevant variants.Nucleic Acids Res.44, D862–D868 (2016)

Landrum, M. J.et al.ClinVar: public archive of interpretations of clinically relevant variants.Nucleic Acids Res.44, D862–D868 (2016)

2016
[31]

D.et al.The Human Gene Mutation Database (HGMD): optimizing its use in a clinical diagnostic or research setting.Hum

Stenson, P. D.et al.The Human Gene Mutation Database (HGMD): optimizing its use in a clinical diagnostic or research setting.Hum. Genet.139, 1197–1207 (2020)

2020
[32]

S., Bocchini, C

Amberger, J. S., Bocchini, C. A., Scott, A. F. & Hamosh, A. OMIM.org: leveraging knowledge across phenotype-gene relationships.Nucleic Acids Res.47, D1038–D1043 (2019)

2019
[33]

N., Gkoutos, G

Boudellioua, I., Kulmanov, M., Schofield, P. N., Gkoutos, G. V . & Hoehndorf, R. DeepPVP: phenotype-based prioritization of causative variants using deep learning.BMC Bioinformatics20, 65 (2019)

2019
[34]

C.et al.Deep structured learning for variant prioritization in Mendelian diseases.Nat

Danzi, M. C.et al.Deep structured learning for variant prioritization in Mendelian diseases.Nat. Commun.14, 4167 (2023)

2023
[35]

Mao, D.et al.AI-MARRVEL: a knowledge-driven AI system for diagnosing Mendelian disorders.NEJM AI1, AIoa2300009 (2024)

2024
[36]

Alsentzer, E.et al.Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases.npj Digit. Med. 8, 380 (2025). 39.Singhal, K.et al.Large language models encode clinical knowledge.Nature620, 172–180 (2023). 40.Tu, T.et al.Towards conversational diagnostic artificial intelligence.Nature642, 442–450 (2025). 41.Zhao, W.et al.An agen...

2025
[37]

Preprint at https://arxiv.org/abs/ 2605.06226 (2026)

Liu, T.et al.A versatile AI agent for rare disease diagnosis and risk gene prioritization. Preprint at https://arxiv.org/abs/ 2605.06226 (2026)

Pith/arXiv arXiv 2026
[38]

Preprint at https://doi.org/10.64898/2026.04.02.26349929 (2026)

Meng, M., Liu, L., Du, Q.et al.Berrylyzer: an efficient, traceable, and lightweight intelligent agentic system for prenatal genetic diagnosis. Preprint at https://doi.org/10.64898/2026.04.02.26349929 (2026)

work page doi:10.64898/2026.04.02.26349929 2026
[39]

J.et al.The mutational constraint spectrum quantified from variation in 141,456 humans.Nature581, 434–443 (2020)

Karczewski, K. J.et al.The mutational constraint spectrum quantified from variation in 141,456 humans.Nature581, 434–443 (2020)

2020
[40]

46.Veli ˇckovi´c, P.et al.Graph Attention Networks

Nguyen, E.et al.Sequence modeling and design from molecular to genome scale with Evo.Science386, eado9336 (2024). 46.Veli ˇckovi´c, P.et al.Graph Attention Networks. InInternational Conference on Learning Representations(2018)

2024
[41]

The GTEx Consortium atlas of genetic regulatory effects across human tissues.Science369, 1318–1330 (2020)

GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues.Science369, 1318–1330 (2020). 48.Milacic, M.et al.The Reactome pathway knowledgebase 2024.Nucleic Acids Res.52, D672–D678 (2024). 49.Du, J.et al.Gene2vec: distributed representation of genes based on co-expression.BMC Genomics20, 82 (2019). 50.Jumper, J.et al.Hig...

2020
[42]

Varadi, M.et al.AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences.Nucleic Acids Res.52, D368–D375 (2024)

2024
[43]

J.et al.RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite.PLoS ONE6, e20161 (2011)

Fleishman, S. J.et al.RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite.PLoS ONE6, e20161 (2011)

2011
[44]

F.et al.The Rosetta all-atom energy function for macromolecular modeling and design.J

Alford, R. F.et al.The Rosetta all-atom energy function for macromolecular modeling and design.J. Chem. Theory Comput.13, 3031–3048 (2017). 11/16 54.Lek, M.et al.Analysis of protein-coding genetic variation in 60,706 humans.Nature536, 285–291 (2016). 55.The 1000 Genomes Project Consortium. A global reference for human genetic variation.Nature526, 68–74 (2...

2017

[1] [1]

Congenital disorders

World Health Organization. Congenital disorders. https://www.who.int/news-room/fact-sheets/detail/birth-defects (2023). Accessed 18 June 2026. 9/16

2023

[2] [2]

Data and statistics on birth defects

Centers for Disease Control and Prevention. Data and statistics on birth defects. https://www.cdc.gov/birth-defects/ data-research/facts-stats/index.html (2026). Accessed 18 June 2026

2026

[3] [3]

About congenital anomalies

Eunice Kennedy Shriver National Institute of Child Health and Human Development. About congenital anomalies. https://www.nichd.nih.gov/health/topics/factsheets/congenital-anomalies (2024). Accessed 18 June 2026

2024

[4] [4]

T.et al.Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies.Am

Miller, D. T.et al.Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies.Am. J. Hum. Genet.86, 749–764 (2010)

2010

[5] [5]

G., Leach, N

Monaghan, K. G., Leach, N. T., Pekarek, D., Prasad, P. & Rose, N. C. The use of fetal exome sequencing in prenatal diagnosis: a points to consider document of the American College of Medical Genetics and Genomics.Genet. Med.22, 675–680 (2020)

2020

[6] [6]

V ora, N. L. & Norton, M. E. Prenatal exome and genome sequencing for fetal structural abnormalities.Am. J. Obstet. Gynecol.228, 140–149 (2023)

2023

[7] [7]

Large-scale discovery of novel genetic causes of developmental disorders

Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature519, 223–228 (2015)

2015

[8] [8]

F.et al.Prevalence and architecture of de novo mutations in developmental disorders.Nature542, 433–438 (2017)

McRae, J. F.et al.Prevalence and architecture of de novo mutations in developmental disorders.Nature542, 433–438 (2017)

2017

[9] [9]

Lord, J.et al.Prenatal exome sequencing analysis in fetal structural anomalies detected by ultrasonography (PAGE): a cohort study.The Lancet393, 747–757 (2019)

2019

[10] [10]

The Lancet393, 758–767 (2019)

Petrovski, S.et al.Whole-exome sequencing in the evaluation of fetal structural anomalies: a prospective cohort study. The Lancet393, 758–767 (2019)

2019

[11] [11]

Fu, F.et al.Application of exome sequencing for prenatal diagnosis of fetal structural anomalies: clinical experience and lessons learned from a cohort of 1618 fetuses.Genome Med.14, 123 (2022)

2022

[12] [12]

& Chitty, L

Mellis, R., Oprych, K., Scotchman, E., Hill, M. & Chitty, L. S. Diagnostic yield of exome sequencing for prenatal diagnosis of fetal structural anomalies: a systematic review and meta-analysis.Prenat. Diagn.42, 662–685 (2022)

2022

[13] [13]

H.et al.Diagnostic yield of pediatric and prenatal exome sequencing in a diverse population.npj Genomic Med.8, 34 (2023)

Wojcik, M. H.et al.Diagnostic yield of pediatric and prenatal exome sequencing in a diverse population.npj Genomic Med.8, 34 (2023)

2023

[14] [14]

N.et al.The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease.Am

Robinson, P. N.et al.The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease.Am. J. Hum. Genet.83, 610–615 (2008). 15.Köhler, S.et al.The Human Phenotype Ontology in 2021.Nucleic Acids Res.49, D1207–D1217 (2021)

2008

[15] [15]

A.et al.The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.Nucleic Acids Res.48, D704–D715 (2020)

Shefchek, K. A.et al.The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.Nucleic Acids Res.48, D704–D715 (2020)

2019

[16] [16]

H., Buske, O

Fujiwara, T., Yamamoto, Y ., Kim, J. H., Buske, O. J. & Takagi, T. PubCaseFinder: a case-report-based, phenotype-driven differential-diagnosis system for rare diseases.Am. J. Hum. Genet.103, 389–399 (2018)

2018

[17] [17]

Javed, A., Agrawal, S. & Ng, P. C. Phen-gen: combining phenotype and genotype to analyze rare disorders.Nat. Methods 11, 935–937 (2014)

2014

[18] [18]

Protoc.10, 2004–2015 (2015)

Smedley, D.et al.Next-generation diagnostics and disease-gene discovery with the Exomiser.Nat. Protoc.10, 2004–2015 (2015)

2004

[19] [19]

D., Ma, X

Li, Q., Zhao, K., Bustamante, C. D., Ma, X. & Wong, W. H. Xrare: a machine learning method jointly modeling phenotypes and genetic evidence for rare disease diagnosis.Genet. Med.21, 2126–2134 (2019)

2019

[20] [20]

N.et al.Interpretable clinical genomics with a likelihood ratio paradigm.Am

Robinson, P. N.et al.Interpretable clinical genomics with a likelihood ratio paradigm.Am. J. Hum. Genet.107, 403–417 (2020)

2020

[21] [21]

Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function.Nucleic Acids Res.31, 3812–3814 (2003)

2003

[22] [22]

A.et al.A method and server for predicting damaging missense mutations.Nat

Adzhubei, I. A.et al.A method and server for predicting damaging missense mutations.Nat. Methods7, 248–249 (2010)

2010

[23] [23]

Genet.46, 310–315 (2014)

Kircher, M.et al.A general framework for estimating the relative pathogenicity of human genetic variants.Nat. Genet.46, 310–315 (2014)

2014

[24] [24]

M., Rödelsperger, C., Schuelke, M

Schwarz, J. M., Rödelsperger, C., Schuelke, M. & Seelow, D. MutationTaster evaluates disease-causing potential of sequence alterations.Nat. Methods7, 575–576 (2010). 10/16

2010

[25] [25]

M.et al.REVEL: an ensemble method for predicting the pathogenicity of rare missense variants.Am

Ioannidis, N. M.et al.REVEL: an ensemble method for predicting the pathogenicity of rare missense variants.Am. J. Hum. Genet.99, 877–885 (2016)

2016

[26] [26]

Genet.50, 1161–1170 (2018)

Sundaram, L.et al.Predicting the clinical impact of human mutation with deep neural networks.Nat. Genet.50, 1161–1170 (2018). 28.Jaganathan, K.et al.Predicting splicing from primary sequence with deep learning.Cell176, 535–548.e24 (2019)

2018

[27] [27]

Cheng, J.et al.Accurate proteome-wide missense variant effect prediction with AlphaMissense.Science381, eadg7492 (2023)

2023

[28] [28]

Med.17, 405–424 (2015)

Richards, S.et al.Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the ACMG and the AMP.Genet. Med.17, 405–424 (2015)

2015

[29] [29]

T.et al.Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the Clinical Genome Resource.Genet

Strande, N. T.et al.Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the Clinical Genome Resource.Genet. Med.19, 896–906 (2017)

2017

[30] [30]

J.et al.ClinVar: public archive of interpretations of clinically relevant variants.Nucleic Acids Res.44, D862–D868 (2016)

Landrum, M. J.et al.ClinVar: public archive of interpretations of clinically relevant variants.Nucleic Acids Res.44, D862–D868 (2016)

2016

[31] [31]

D.et al.The Human Gene Mutation Database (HGMD): optimizing its use in a clinical diagnostic or research setting.Hum

Stenson, P. D.et al.The Human Gene Mutation Database (HGMD): optimizing its use in a clinical diagnostic or research setting.Hum. Genet.139, 1197–1207 (2020)

2020

[32] [32]

S., Bocchini, C

Amberger, J. S., Bocchini, C. A., Scott, A. F. & Hamosh, A. OMIM.org: leveraging knowledge across phenotype-gene relationships.Nucleic Acids Res.47, D1038–D1043 (2019)

2019

[33] [33]

N., Gkoutos, G

Boudellioua, I., Kulmanov, M., Schofield, P. N., Gkoutos, G. V . & Hoehndorf, R. DeepPVP: phenotype-based prioritization of causative variants using deep learning.BMC Bioinformatics20, 65 (2019)

2019

[34] [34]

C.et al.Deep structured learning for variant prioritization in Mendelian diseases.Nat

Danzi, M. C.et al.Deep structured learning for variant prioritization in Mendelian diseases.Nat. Commun.14, 4167 (2023)

2023

[35] [35]

Mao, D.et al.AI-MARRVEL: a knowledge-driven AI system for diagnosing Mendelian disorders.NEJM AI1, AIoa2300009 (2024)

2024

[36] [36]

Alsentzer, E.et al.Few shot learning for phenotype-driven diagnosis of patients with rare genetic diseases.npj Digit. Med. 8, 380 (2025). 39.Singhal, K.et al.Large language models encode clinical knowledge.Nature620, 172–180 (2023). 40.Tu, T.et al.Towards conversational diagnostic artificial intelligence.Nature642, 442–450 (2025). 41.Zhao, W.et al.An agen...

2025

[37] [37]

Preprint at https://arxiv.org/abs/ 2605.06226 (2026)

Liu, T.et al.A versatile AI agent for rare disease diagnosis and risk gene prioritization. Preprint at https://arxiv.org/abs/ 2605.06226 (2026)

Pith/arXiv arXiv 2026

[38] [38]

Preprint at https://doi.org/10.64898/2026.04.02.26349929 (2026)

Meng, M., Liu, L., Du, Q.et al.Berrylyzer: an efficient, traceable, and lightweight intelligent agentic system for prenatal genetic diagnosis. Preprint at https://doi.org/10.64898/2026.04.02.26349929 (2026)

work page doi:10.64898/2026.04.02.26349929 2026

[39] [39]

J.et al.The mutational constraint spectrum quantified from variation in 141,456 humans.Nature581, 434–443 (2020)

Karczewski, K. J.et al.The mutational constraint spectrum quantified from variation in 141,456 humans.Nature581, 434–443 (2020)

2020

[40] [40]

46.Veli ˇckovi´c, P.et al.Graph Attention Networks

Nguyen, E.et al.Sequence modeling and design from molecular to genome scale with Evo.Science386, eado9336 (2024). 46.Veli ˇckovi´c, P.et al.Graph Attention Networks. InInternational Conference on Learning Representations(2018)

2024

[41] [41]

The GTEx Consortium atlas of genetic regulatory effects across human tissues.Science369, 1318–1330 (2020)

GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues.Science369, 1318–1330 (2020). 48.Milacic, M.et al.The Reactome pathway knowledgebase 2024.Nucleic Acids Res.52, D672–D678 (2024). 49.Du, J.et al.Gene2vec: distributed representation of genes based on co-expression.BMC Genomics20, 82 (2019). 50.Jumper, J.et al.Hig...

2020

[42] [42]

Varadi, M.et al.AlphaFold protein structure database in 2024: providing structure coverage for over 214 million protein sequences.Nucleic Acids Res.52, D368–D375 (2024)

2024

[43] [43]

J.et al.RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite.PLoS ONE6, e20161 (2011)

Fleishman, S. J.et al.RosettaScripts: a scripting language interface to the Rosetta macromolecular modeling suite.PLoS ONE6, e20161 (2011)

2011

[44] [44]

F.et al.The Rosetta all-atom energy function for macromolecular modeling and design.J

Alford, R. F.et al.The Rosetta all-atom energy function for macromolecular modeling and design.J. Chem. Theory Comput.13, 3031–3048 (2017). 11/16 54.Lek, M.et al.Analysis of protein-coding genetic variation in 60,706 humans.Nature536, 285–291 (2016). 55.The 1000 Genomes Project Consortium. A global reference for human genetic variation.Nature526, 68–74 (2...

2017