Federated SPARQL querying for genomic variant functional annotation
Pith reviewed 2026-06-27 22:56 UTC · model grok-4.3
The pith
Genomic variant annotation can be performed with federated SPARQL queries over remote UniprotKB while keeping sensitive clinico-genomic data local.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Variant annotation is achieved when local clinico-genomic knowledge graphs are linked through federated SPARQL queries to UniprotKB, a curated public knowledge graph of genes and proteins, so that functional information is obtained on demand without duplicating public data or moving sensitive records.
What carries the argument
Federated SPARQL querying, which distributes a single query across multiple knowledge graphs so results are assembled without centralizing the underlying data sources.
If this is right
- Sensitive genomic records remain on the originating site throughout the annotation process.
- Public resources such as UniprotKB are accessed only as needed rather than copied in full.
- Ontology-based modeling of local data supports consistent FAIR-compliant sharing and reuse.
Where Pith is reading between the lines
- The same federated pattern could be applied to annotation tasks that draw from additional public knowledge graphs beyond UniprotKB.
- Query performance and result completeness would need direct benchmarking against conventional local duplication pipelines to quantify trade-offs.
- Expanding the local graph with further ontologies could support additional annotation types such as pathway or phenotype links.
Load-bearing premise
That clinico-genomic data can be modelled as a knowledge graph leveraging state-of-the-art biomedical ontologies and that federated SPARQL queries can effectively perform variant annotation by querying UniprotKB.
What would settle it
A side-by-side test on the same ICAN variants showing that federated SPARQL results miss or misstate functional annotations that standard local duplication tools correctly retrieve.
Figures
read the original abstract
Sensitive health data should preferentially be analysed on site. In typical bioinformatics workows, public databases are duplicated and used by specialised tools to enrich the local datasets. In the case of genomic variation data, this process is called variant annotation. In this session we demonstrate variant annotation using federated SPARQL queries. We rst overview how clinico-genomic data can be modelled as a knowledge graph (KG), leveraging state-of-the-art biomedical ontologies. We then perform variant annotation by querying UniprotKB, a massive curated KG for gene and proteins. Our approach avoids public data duplication while maintaining genomic data on site and aligning it with FAIR principles. Our use-case is based on the ICAN project, a research program aimed at studying the physiopathology of cerebral berry aneurysms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that clinico-genomic data from the ICAN project can be modeled as a knowledge graph using state-of-the-art biomedical ontologies, enabling variant annotation via federated SPARQL queries against UniProtKB without duplicating public data, thereby maintaining data on-site and aligning with FAIR principles.
Significance. If the demonstration holds with supporting evidence, the work would offer a practical method for privacy-preserving genomic analysis that avoids data duplication, potentially advancing secure, distributed bioinformatics workflows in sensitive health data contexts.
major comments (2)
- [Abstract] Abstract: The central claim that federated SPARQL queries 'perform variant annotation by querying UniprotKB' is unsupported, as the manuscript supplies no query examples, schema mappings, performance metrics, annotation accuracy results, or comparison to local duplication methods.
- [Abstract] Abstract: The assertion that the approach 'avoids public data duplication while maintaining genomic data on site' lacks any implementation details, error analysis, or validation on the ICAN dataset to confirm that the queries achieve the required functional annotations at scale.
minor comments (3)
- [Abstract] Abstract: Typo 'workows' should be 'workflows'.
- [Abstract] Abstract: Typo 'rst' should be 'first'.
- [Abstract] Abstract: Spelling 'UniprotKB' should be standardized to 'UniProtKB' for consistency with the database name.
Simulated Author's Rebuttal
We thank the referee for the detailed comments on our manuscript. We address each major comment below and will revise the manuscript to provide the requested supporting details and evidence.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that federated SPARQL queries 'perform variant annotation by querying UniprotKB' is unsupported, as the manuscript supplies no query examples, schema mappings, performance metrics, annotation accuracy results, or comparison to local duplication methods.
Authors: We agree that the abstract would be strengthened by explicit supporting evidence. In the revised manuscript we will add concrete SPARQL query examples, describe the schema mappings used to link the ICAN clinico-genomic KG to UniProtKB, report performance metrics and annotation accuracy from our runs, and include a brief comparison against a local-duplication baseline. revision: yes
-
Referee: [Abstract] Abstract: The assertion that the approach 'avoids public data duplication while maintaining genomic data on site' lacks any implementation details, error analysis, or validation on the ICAN dataset to confirm that the queries achieve the required functional annotations at scale.
Authors: We accept this observation. The revision will supply implementation details of the federated endpoint configuration, an error analysis of query failures or incomplete annotations, and validation results on the ICAN dataset. We will also report dataset size and query execution times to address the question of scale. revision: yes
Circularity Check
No circularity: demonstration of existing federated querying technique with no derivations or self-referential reductions
full rationale
The paper describes modeling clinico-genomic data as a KG and performing variant annotation via federated SPARQL queries against UniprotKB for the ICAN project. It contains no equations, parameters, fitted values, or derivation chains. The central claims rest on application of standard biomedical ontologies and SPARQL federation, which are external to the paper and not reduced to self-citations or definitions within it. No load-bearing steps match any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Clinico-genomic data can be modelled as a knowledge graph leveraging state-of-the-art biomedical ontologies.
- domain assumption Federated SPARQL queries against UniprotKB can perform variant annotation without local data duplication.
Reference graph
Works this paper leans on
-
[2]
and Hegde, Madhuri and Lyon, Elaine and Spector, Elaine and Voelkerding, Karl and Rehm, Heidi L
Richards, Sue and Aziz, Nazneen and Bale, Sherri and Bick, David and Das, Soma and Gastier-Foster, Julie and Grody, Wayne W. and Hegde, Madhuri and Lyon, Elaine and Spector, Elaine and Voelkerding, Karl and Rehm, Heidi L. , urldate =. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American Col...
-
[5]
Semantic Beacons: a framework to support federated querying over genomic variants and public Knowledge Graphs , url =
Bodrug-Schepers, Alexandrina and Chabane, Hugo and Montoya, Gabriela and Redon, Richard and Gaignard, Alban and Serrano-Alvarado, Patricia , urldate =. Semantic Beacons: a framework to support federated querying over genomic variants and public Knowledge Graphs , url =
-
[7]
and Dessimoz, Christophe and Chiba, Hirokazu and Bastian, Frederic B
Kushida, Tatsuya and de Farias, Tarcisio Mendes and Sima, Ana C. and Dessimoz, Christophe and Chiba, Hirokazu and Bastian, Frederic B. and Masuya, Hiroshi , urldate =. Federated. doi:10.1186/s12911-025-03013-8 , shorttitle =
-
[10]
and Unni, Deepak and Callahan, Tiffany J
Reese, Justin T. and Unni, Deepak and Callahan, Tiffany J. and Cappelletti, Luca and Ravanmehr, Vida and Carbon, Seth and Shefchek, Kent A. and Good, Benjamin M. and Balhoff, James P. and Fontana, Tommaso and Blau, Hannah and Matentzoglu, Nicolas and Harris, Nomi L. and Munoz-Torres, Monica C. and Haendel, Melissa A. and Robinson, Peter N. and Joachimiak,...
-
[11]
doi:10.1038/s41586-023-06045-0
Chen, S., Francioli, L.C., Goodrich, J.K., et al.: A genomic mutational constraint map using variation in 76,156 human genomes 625(7993), 92--100. doi:10.1038/s41586-023-06045-0
-
[12]
UniProt : the U niversal P rotein K nowledgebase in 2023
The UniProt Consortium : UniProt : the universal protein knowledgebase in 2023 51, D523--D531. doi:10.1093/nar/gkac1052
-
[13]
Bourcier, R., Chatel, S., Bourcereau, E., et al.: Understanding the pathophysiology of intracranial aneurysm: The ICAN project 80(4), 621. doi:10.1093/neuros/nyw135
-
[14]
Jhee, J.H., Megina, A., Beaufils, P.C.D., et al.: Predicting clinical outcomes from patient care pathways represented with temporal knowledge graphs. doi:10.48550/arXiv.2502.21138
-
[15]
In: SWAT 4HCLS 2025
Bodrug-Schepers, A., Chabane, H., Montoya, G., et al.: Semantic beacons: a framework to support federated querying over genomic variants and public knowledge graphs. In: SWAT 4HCLS 2025. https://hal.science/hal-04908530
2025
-
[16]
doi:10.3389/fdata.2024.1466391
Prasanna, S., Kumar, A., Rao, D., et al.: A scalable tool for analyzing genomic variants of humans using knowledge graphs and graph machine learning 7. doi:10.3389/fdata.2024.1466391
-
[17]
In: Pesquita, C., Skaf-Molli, H., Efthymiou, V., et al
Pelgrin, O., Taelman, R., Galárraga, L., et al.: GLENDA : Querying RDF archives with full SPARQL . In: Pesquita, C., Skaf-Molli, H., Efthymiou, V., et al. (eds.) The Semantic Web: ESWC 2023 Satellite Events, vol. 13998, pp. 75--80. Springer Nature Switzerland. doi:10.1007/978-3-031-43458-7_14, series Title: Lecture Notes in Computer Science
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.