dashi: A Python library for Dataset Shift Characterization to Support Trustworthy AI Development and Deployment

\'Angel S\'anchez-Garc\'ia; Carlos S\'aez; David Fern\'andez-Narro; Juan M. Garc\'ia-G\'omez; Pablo Ferri

arxiv: 2605.31360 · v1 · pith:GZPVAWI7new · submitted 2026-05-29 · 💻 cs.LG · cs.AI

dashi: A Python library for Dataset Shift Characterization to Support Trustworthy AI Development and Deployment

David Fern\'andez-Narro , Pablo Ferri , \'Angel S\'anchez-Garc\'ia , Juan M. Garc\'ia-G\'omez , Carlos S\'aez This is my paper

Pith reviewed 2026-06-28 22:56 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords dataset shiftPython librarytrustworthy AIinformation geometryhealth AIcovariate shifttemporal shiftmulti-source variability

0 comments

The pith

The dashi Python library quantifies dataset shifts with unsupervised information geometry metrics and supervised performance checks to support trustworthy AI.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces dashi, an open-source Python library for exploring, quantifying, and characterizing dataset shifts where train and test data distributions differ. It offers an unsupervised approach based on information geometry and non-parametric statistical manifolds, plus a supervised approach that measures model performance degradation, both operating on user-defined temporal and multi-source batches. The library is demonstrated on health AI case studies involving gestational diabetes, COVID-19, and emergency medical dispatch. A sympathetic reader would care because uncontrolled shifts can degrade AI models and compromise safety, especially in healthcare where patient rights are at stake. By supplying visual analytics and variability metrics, the work aims to enable more coherent data assessment throughout the AI lifecycle.

Core claim

dashi is a Python library providing a dual approach to dataset shift analysis: an unsupervised method that uses information geometry and non-parametric statistical manifolds to characterize data variability through metrics such as Global Probabilistic Deviation and Source Probabilistic Outlyingness, together with Information Geometric Temporal plots, and a supervised method that quantifies model performance degradation, with both methods applicable across user-defined temporal and domain or source batches.

What carries the argument

The dual unsupervised-supervised framework that applies information geometry and non-parametric statistical manifolds for variability metrics alongside performance degradation analysis on temporal and multi-source batches.

If this is right

Shifts can be quantified and visualized across both temporal and multi-source batches using the supplied metrics.
Model performance changes due to shifts can be tracked through the supervised component.
Interactive analytics enable assessment of data coherence to guide AI pipeline decisions.
The tools apply to training and operational stages to help maintain reliability in health AI systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same metrics could support ongoing monitoring once a model is deployed rather than only during development.
Integration with existing training workflows might allow automatic retraining triggers when certain shift thresholds are crossed.
The library's structure could be tested on non-health domains such as financial or sensor data where distribution changes are also common.

Load-bearing premise

That the unsupervised metrics derived from information geometry and non-parametric manifolds deliver actionable characterization of shifts that meaningfully supports AI trustworthiness and safety.

What would settle it

A controlled test on health data in which shifts detected and measured by dashi show no consistent correlation with actual drops in model accuracy or increases in safety risks.

Figures

Figures reproduced from arXiv: 2605.31360 by \'Angel S\'anchez-Garc\'ia, Carlos S\'aez, David Fern\'andez-Narro, Juan M. Garc\'ia-G\'omez, Pablo Ferri.

**Figure 2.** Figure 2: Characterization of temporal concept shift in the simulated Gestational Diabetes Mellitus (GDM) dataset using dashi. (A, B) Data temporal heatmaps illustrating the evolution of the fasting glucose probability distribution for positive (A) and negative (B) GDM cases. (C) Information Geometric Temporal (IGT) projection plot visualizing the trajectory of the multivariate conditional distribution over time. D1… view at source ↗

**Figure 3.** Figure 3: Characterization of multi-source variability in the Mexico COVID-19 dataset using dashi. (A) Multivariate data source map visualizing the distribution differences across sites utilizing the first three principal components of PCA. (B) Multi-source variability (MSV) metrics plot quantifying cross-site disparities. Distance between sites is determined through Global Probabilistic Deviation (GPD) and the colo… view at source ↗

**Figure 4.** Figure 4: Characterization of 112 Emergency Calls dataset’s temporal variability using dashi. (A) Multivariate data temporal heatmap displaying the evolution of contextual embeddings across the first three principal components following Singular Value Decomposition (SVD). (B) Information Geometric Temporal (IGT) projection plot characterizing the temporal variability of the dataset across monthly batches. (C) Condit… view at source ↗

read the original abstract

The Artificial Intelligence (AI) life cycle requires a thorough understanding of the underlying data dynamics for robust, safe and cost-effective AI development and use. Dataset shifts are defined as changes between train and test data distributions. Whether occurring over time (temporal) or across different sites (multi-source), they can severely degrade model performance and compromise data quality. This is particularly important in health AI, where the safety and fundamental rights of patients can be severely affected by uncontrolled shifts both at training and operational stages. While the theoretical foundations of covariate, prior, and concept shifts are well established, there is a lack of accessible and comprehensive software tools to perform their analysis. We introduce dashi, an open-source Python library designed for the exploration, quantification, and characterization of dataset shifts. dashi provides a dual approach: an unsupervised approach that leverages information geometry and non-parametric statistical manifolds to data variability characterization and analysis (e.g., Information Geometric Temporal plots and Multi-Source Variability metrics like Global Probabilistic Deviation and Source Probabilistic Outlyingness), and a supervised approach that quantifies and characterizes model performance degradation. Both unsupervised and supervised approaches work across user-defined temporal and domain/source batches. We demonstrate the utility of dashi on three simulated and real-world health AI case studies on gestational diabetes mellitus, COVID-19 and emergency medical dispatch. By providing interactive visual analytics and variability metrics, dashi supports trustworthiness of AI life cycle stages enabling robust and safe machine learning pipelines through the assessment of data coherence and AI performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

dashi is a new Python library packaging info-geometry and supervised checks for dataset shifts in health AI, but the abstract gives almost no implementation or validation detail.

read the letter

The main thing to know is that this paper ships an open-source Python library called dashi that combines unsupervised methods based on information geometry with supervised checks for model performance drop, all aimed at temporal and multi-source shifts in health data.

What it actually does is pull together existing ideas—non-parametric statistical manifolds, metrics like Global Probabilistic Deviation and Source Probabilistic Outlyingness, plus interactive plots—into one package that works across user-defined batches. The three case studies on gestational diabetes, COVID-19, and emergency dispatch show the intended workflow. For someone building or deploying health models, having a single tool that handles both unsupervised variability and supervised degradation in the same framework could save time.

The soft spot is that the write-up stays at the feature-list level. There are no reported benchmarks against other shift detectors, no error analysis on the new metrics, and no code-level description of how the manifolds or outlyingness scores are computed. Without those, it is hard to judge whether the library adds more than a convenient wrapper around standard statistical tests.

This is for practitioners who need practical monitoring tools rather than new theory. A reader already working on trustworthy health AI pipelines might find the batch handling and dual approach worth trying.

It is worth sending to peer review so referees can look at the actual implementation and any reproducibility materials.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces dashi, an open-source Python library for the exploration, quantification, and characterization of dataset shifts. It offers a dual approach: an unsupervised method based on information geometry and non-parametric statistical manifolds (including Information Geometric Temporal plots and metrics such as Global Probabilistic Deviation and Source Probabilistic Outlyingness) plus a supervised method for quantifying model performance degradation. Both operate over user-defined temporal and domain/source batches. Utility is demonstrated via three simulated and real-world health AI case studies (gestational diabetes mellitus, COVID-19, and emergency medical dispatch). The central claim is that the library's interactive visual analytics and variability metrics support trustworthy AI development and deployment by assessing data coherence.

Significance. If the implemented metrics and visualizations prove reliable and actionable, an open-source library providing both unsupervised geometric and supervised performance-based shift tools would address a genuine gap in accessible software for dataset shift analysis, particularly in high-stakes health AI applications where shifts can affect safety and rights. The dual unsupervised/supervised design and batch flexibility are explicit strengths that could facilitate reproducible pipelines.

major comments (2)

[Case Studies] Case Studies section: the three demonstrations are described at a high level but the manuscript provides no quantitative validation metrics, error analysis, or comparison against existing shift-detection baselines for the unsupervised metrics (Global Probabilistic Deviation, Source Probabilistic Outlyingness). This is load-bearing for the claim that the tools meaningfully support AI trustworthiness and safety.
[Methods] Methods / Unsupervised Approach: the information-geometric and non-parametric manifold constructions are referenced but lack explicit algorithmic pseudocode, parameter settings, or sensitivity analysis, preventing independent assessment of whether the metrics are robust or merely descriptive.

minor comments (2)

The manuscript would benefit from a dedicated 'Availability and Installation' subsection that includes the exact GitHub or PyPI link, license, and minimum Python/dependency versions.
Notation for the variability metrics is introduced in prose; adding a short mathematical definitions table or appendix would improve clarity for readers unfamiliar with information geometry.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for strengthening the manuscript. We agree that both major points require attention and will revise the paper to address them directly.

read point-by-point responses

Referee: [Case Studies] Case Studies section: the three demonstrations are described at a high level but the manuscript provides no quantitative validation metrics, error analysis, or comparison against existing shift-detection baselines for the unsupervised metrics (Global Probabilistic Deviation, Source Probabilistic Outlyingness). This is load-bearing for the claim that the tools meaningfully support AI trustworthiness and safety.

Authors: We agree that the case studies, as currently presented, are primarily illustrative and do not include the requested quantitative validation, error analysis, or baseline comparisons. This limits the strength of the claims regarding support for AI trustworthiness. In the revised manuscript we will add quantitative evaluations of the unsupervised metrics (e.g., correlation with known distribution changes, comparison of Global Probabilistic Deviation and Source Probabilistic Outlyingness against baselines such as Kolmogorov-Smirnov tests and other shift detectors), along with error analysis and discussion of how these metrics relate to downstream model performance degradation. revision: yes
Referee: [Methods] Methods / Unsupervised Approach: the information-geometric and non-parametric manifold constructions are referenced but lack explicit algorithmic pseudocode, parameter settings, or sensitivity analysis, preventing independent assessment of whether the metrics are robust or merely descriptive.

Authors: We concur that the absence of pseudocode, explicit parameter settings, and sensitivity analysis hinders reproducibility and independent evaluation. The revised Methods section will include algorithmic pseudocode for the core unsupervised procedures (information-geometric manifold construction, Global Probabilistic Deviation, and Source Probabilistic Outlyingness), the specific parameter values used in the library implementation, and a sensitivity analysis examining robustness to key hyperparameters and data characteristics. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a software library for dataset shift analysis with no mathematical derivations, equations, predictions, or fitted parameters. All claims concern implementation of existing concepts (information geometry, variability metrics) and case-study demonstrations; no step reduces by construction to its own inputs, and no self-citation chain is load-bearing for a theoretical result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software library introduction paper; no free parameters, mathematical axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.1-grok · 5829 in / 1079 out tokens · 25454 ms · 2026-06-28T22:56:46.941805+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 20 canonical work pages

[1]

Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support

Sáez C, Ferri P, García-Gómez JM. Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support. J Med Internet Res. JMIR Publications Inc., Toronto, Canada; 2024; doi: 10.2196/50295

work page doi:10.2196/50295 2024
[2]

Machine Learning in Medicine

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. N Engl J Med; 2019; doi: 10.1056/nejmra1814259

work page doi:10.1056/nejmra1814259 2019
[3]

Key challenges for delivering clinical impact with artificial intelligence

Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine 2019 17:1. BioMed Central; 2019; doi: 10.1186/s12916-019-1426-2

work page doi:10.1186/s12916-019-1426-2 2019
[4]

The Clinician and Dataset Shift in Artificial Intelligence

Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al.. The Clinician and Dataset Shift in Artificial Intelligence. N Engl J Med. N Engl J Med; 2021; doi: 10.1056/nejmc2104626

work page doi:10.1056/nejmc2104626 2021
[5]

A unifying view on dataset shift in classification

Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla N V, Herrera F. A unifying view on dataset shift in classification. Pattern Recognit. 2012; doi: 10.1016/j.patcog.2011.06.019

work page doi:10.1016/j.patcog.2011.06.019 2012
[6]

Dataset shift in machine learning

Quiñonero-Candela J. Dataset shift in machine learning. Neural information processing series. Cambridge, Mass.: MIT Press
[7]

Quantifying Epistemic Uncertainty in Predictions for Safer Health AI Performance Under Dataset Shifts

Fernández-Narro D, Ferri P, García-Gómez JM, Sáez C. Quantifying Epistemic Uncertainty in Predictions for Safer Health AI Performance Under Dataset Shifts. Stud Health Technol Inform. IOS Press; 2025; doi: 10.3233/SHTI251493

work page doi:10.3233/shti251493 2025
[8]

Fernández-Narro D, Ferri P, Gutiérrez-Sacristán A, García-Gómez JM, Sáez C. Unsupervised Characterization of Temporal Dataset Shifts as an Early Indicator of AI Performance Variations: Evaluation Study Using the Medical Information Mart for Intensive Care-IV Dataset. JMIR Med Inform. JMIR Medical Informatics; 2025; doi: 10.2196/78309

work page doi:10.2196/78309 2025
[10]

Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research

Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association. Oxford Academic; 2013; doi: 10.1136/amiajnl-2011-000681

work page doi:10.1136/amiajnl-2011-000681 2013
[11]

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review

Schwabe D, Becker K, Seyferth M, Klaß A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Medicine 2024 7:1. Nature Publishing Group; 2024; doi: 10.1038/s41746-024-01196-4

work page doi:10.1038/s41746-024-01196-4 2024
[12]

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study

Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross- sectional study. PLoS Med. Public Library of Science; 2018; doi: 10.1371/journal.pmed.1002683

work page doi:10.1371/journal.pmed.1002683 2018
[13]

Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks

Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, et al.. Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks. Proc. Mach. Learn. Res. PMLR; p. 381–405
[14]

Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

Sáez C, Rodrigues PP, Gama J, Robles M, García-Gómez JM. Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Min Knowl Discov. 2015; doi: 10.1007/s10618-014-0378-6

work page doi:10.1007/s10618-014-0378-6 2015
[16]

EHRtemporalVariability: delineating temporal data-set shifts in electronic health records

Sáez C, Gutiérrez-Sacristán A, Kohane I, García-Gómez JM, Avillach P. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. Gigascience. 2020; doi: 10.1093/gigascience/giaa079

work page doi:10.1093/gigascience/giaa079 2020
[17]

Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories

Sáez C, Zurriaga O, Pérez-Panadés J, Melchor I, Robles M, García-Gómez JM. Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories. Journal of the American Medical Informatics Association. Oxford Academic; 2016; doi: 10.1093/JAMIA/OCW010

work page doi:10.1093/jamia/ocw010 2016
[18]

Sáez C, García-Gómez JM. Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. Int J Med Inform. Elsevier; 2018; doi: 10.1016/j.ijmedinf.2018.09.015

work page doi:10.1016/j.ijmedinf.2018.09.015 2018
[19]

Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances

Sáez C, Robles M, García-Gómez JM. Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Stat Methods Med Res. SAGE Publications Ltd; 2017; doi: 10.1177/0962280214545122

work page doi:10.1177/0962280214545122 2017
[20]

https://www.gob.mx/salud/documentos/datos-abiertos-bases-historicas-direccion- general-de-epidemiologia Accessed 2026 Apr 21

: Datos Abiertos Bases Históricas | Secretaría de Salud | Gobierno | gob.mx. https://www.gob.mx/salud/documentos/datos-abiertos-bases-historicas-direccion- general-de-epidemiologia Accessed 2026 Apr 21

2026
[21]

An end-to- end solution for out-of-hospital emergency medical dispatch triage based on multimodal and continual deep learning

Ferri P, Sáez C, Félix-De Castro A, Sánchez-Cuesta P, García-Gómez JM. An end-to- end solution for out-of-hospital emergency medical dispatch triage based on multimodal and continual deep learning. Artif Intell Med. Elsevier; 2025; doi: 10.1016/J.ARTMED.2025.103264

work page doi:10.1016/j.artmed.2025.103264 2025
[22]

BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Remy F, Demuynck K, Demeester T. BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights. 2023

2023
[23]

Deep continual learning for medical call incidents text classification under the presence of dataset shifts

Ferri P, Lomonaco V, Passaro LC, Félix-De Castro A, Sánchez-Cuesta P, Sáez C, et al.. Deep continual learning for medical call incidents text classification under the presence of dataset shifts. Comput Biol Med. Pergamon; 2024; doi: 10.1016/j.compbiomed.2024.108548

work page doi:10.1016/j.compbiomed.2024.108548 2024
[24]

Towards an Analytical System for Supervising Fairness, Robustness, and Dataset Shifts in Health AI

Sánchez-García Á, Fernández-Narro D, Ferri P, García-Gómez JM, Sáez C. Towards an Analytical System for Supervising Fairness, Robustness, and Dataset Shifts in Health AI. Stud Health Technol Inform. IOS Press; 2025; doi: 10.3233/SHTI251537

work page doi:10.3233/shti251537 2025
[25]

Multisource Coherence Analysis of the First European Multicenter Cohort Study for Cancer Prevention in People Experiencing Homelessness: Data Quality Study

Blasco-Calafat A, Blanes-Selva V, Fragner T, Doñate-Martínez A, Alhambra-Borrás T, Gawronska J, et al.. Multisource Coherence Analysis of the First European Multicenter Cohort Study for Cancer Prevention in People Experiencing Homelessness: Data Quality Study. JMIR Med Inform. JMIR Medical Informatics; 2025; doi: 10.2196/73596

work page doi:10.2196/73596 2025
[26]

Open-Source Drift Detection Tools in Action: Insights from Two Use Cases Davor Stjelja

Müller R, Abdelaal M, Oy Helsinki G, DavorStjelja F. Open-Source Drift Detection Tools in Action: Insights from Two Use Cases Davor Stjelja. Proceedings of ACM Conference (Conference’17). 2024; doi: 10.1145/nnnnnnn.nnnnnnn

work page doi:10.1145/nnnnnnn.nnnnnnn 2024

[1] [1]

Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support

Sáez C, Ferri P, García-Gómez JM. Resilient Artificial Intelligence in Health: Synthesis and Research Agenda Toward Next-Generation Trustworthy Clinical Decision Support. J Med Internet Res. JMIR Publications Inc., Toronto, Canada; 2024; doi: 10.2196/50295

work page doi:10.2196/50295 2024

[2] [2]

Machine Learning in Medicine

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. N Engl J Med; 2019; doi: 10.1056/nejmra1814259

work page doi:10.1056/nejmra1814259 2019

[3] [3]

Key challenges for delivering clinical impact with artificial intelligence

Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine 2019 17:1. BioMed Central; 2019; doi: 10.1186/s12916-019-1426-2

work page doi:10.1186/s12916-019-1426-2 2019

[4] [4]

The Clinician and Dataset Shift in Artificial Intelligence

Finlayson SG, Subbaswamy A, Singh K, Bowers J, Kupke A, Zittrain J, et al.. The Clinician and Dataset Shift in Artificial Intelligence. N Engl J Med. N Engl J Med; 2021; doi: 10.1056/nejmc2104626

work page doi:10.1056/nejmc2104626 2021

[5] [5]

A unifying view on dataset shift in classification

Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla N V, Herrera F. A unifying view on dataset shift in classification. Pattern Recognit. 2012; doi: 10.1016/j.patcog.2011.06.019

work page doi:10.1016/j.patcog.2011.06.019 2012

[6] [6]

Dataset shift in machine learning

Quiñonero-Candela J. Dataset shift in machine learning. Neural information processing series. Cambridge, Mass.: MIT Press

[7] [7]

Quantifying Epistemic Uncertainty in Predictions for Safer Health AI Performance Under Dataset Shifts

Fernández-Narro D, Ferri P, García-Gómez JM, Sáez C. Quantifying Epistemic Uncertainty in Predictions for Safer Health AI Performance Under Dataset Shifts. Stud Health Technol Inform. IOS Press; 2025; doi: 10.3233/SHTI251493

work page doi:10.3233/shti251493 2025

[8] [8]

Fernández-Narro D, Ferri P, Gutiérrez-Sacristán A, García-Gómez JM, Sáez C. Unsupervised Characterization of Temporal Dataset Shifts as an Early Indicator of AI Performance Variations: Evaluation Study Using the Medical Information Mart for Intensive Care-IV Dataset. JMIR Med Inform. JMIR Medical Informatics; 2025; doi: 10.2196/78309

work page doi:10.2196/78309 2025

[9] [10]

Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research

Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association. Oxford Academic; 2013; doi: 10.1136/amiajnl-2011-000681

work page doi:10.1136/amiajnl-2011-000681 2013

[10] [11]

The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review

Schwabe D, Becker K, Seyferth M, Klaß A, Schaeffter T. The METRIC-framework for assessing data quality for trustworthy AI in medicine: a systematic review. npj Digital Medicine 2024 7:1. Nature Publishing Group; 2024; doi: 10.1038/s41746-024-01196-4

work page doi:10.1038/s41746-024-01196-4 2024

[11] [12]

Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study

Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross- sectional study. PLoS Med. Public Library of Science; 2018; doi: 10.1371/journal.pmed.1002683

work page doi:10.1371/journal.pmed.1002683 2018

[12] [13]

Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks

Nestor B, McDermott MBA, Boag W, Berner G, Naumann T, Hughes MC, et al.. Feature Robustness in Non-stationary Health Records: Caveats to Deployable Model Performance in Common Clinical Machine Learning Tasks. Proc. Mach. Learn. Res. PMLR; p. 381–405

[13] [14]

Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality

Sáez C, Rodrigues PP, Gama J, Robles M, García-Gómez JM. Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data quality. Data Min Knowl Discov. 2015; doi: 10.1007/s10618-014-0378-6

work page doi:10.1007/s10618-014-0378-6 2015

[14] [16]

EHRtemporalVariability: delineating temporal data-set shifts in electronic health records

Sáez C, Gutiérrez-Sacristán A, Kohane I, García-Gómez JM, Avillach P. EHRtemporalVariability: delineating temporal data-set shifts in electronic health records. Gigascience. 2020; doi: 10.1093/gigascience/giaa079

work page doi:10.1093/gigascience/giaa079 2020

[15] [17]

Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories

Sáez C, Zurriaga O, Pérez-Panadés J, Melchor I, Robles M, García-Gómez JM. Applying probabilistic temporal and multisite data quality control methods to a public health mortality registry in Spain: a systematic approach to quality control of repositories. Journal of the American Medical Informatics Association. Oxford Academic; 2016; doi: 10.1093/JAMIA/OCW010

work page doi:10.1093/jamia/ocw010 2016

[16] [18]

Sáez C, García-Gómez JM. Kinematics of Big Biomedical Data to characterize temporal variability and seasonality of data repositories: Functional Data Analysis of data temporal evolution over non-parametric statistical manifolds. Int J Med Inform. Elsevier; 2018; doi: 10.1016/j.ijmedinf.2018.09.015

work page doi:10.1016/j.ijmedinf.2018.09.015 2018

[17] [19]

Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances

Sáez C, Robles M, García-Gómez JM. Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Stat Methods Med Res. SAGE Publications Ltd; 2017; doi: 10.1177/0962280214545122

work page doi:10.1177/0962280214545122 2017

[18] [20]

https://www.gob.mx/salud/documentos/datos-abiertos-bases-historicas-direccion- general-de-epidemiologia Accessed 2026 Apr 21

: Datos Abiertos Bases Históricas | Secretaría de Salud | Gobierno | gob.mx. https://www.gob.mx/salud/documentos/datos-abiertos-bases-historicas-direccion- general-de-epidemiologia Accessed 2026 Apr 21

2026

[19] [21]

An end-to- end solution for out-of-hospital emergency medical dispatch triage based on multimodal and continual deep learning

Ferri P, Sáez C, Félix-De Castro A, Sánchez-Cuesta P, García-Gómez JM. An end-to- end solution for out-of-hospital emergency medical dispatch triage based on multimodal and continual deep learning. Artif Intell Med. Elsevier; 2025; doi: 10.1016/J.ARTMED.2025.103264

work page doi:10.1016/j.artmed.2025.103264 2025

[20] [22]

BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights

Remy F, Demuynck K, Demeester T. BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights. 2023

2023

[21] [23]

Deep continual learning for medical call incidents text classification under the presence of dataset shifts

Ferri P, Lomonaco V, Passaro LC, Félix-De Castro A, Sánchez-Cuesta P, Sáez C, et al.. Deep continual learning for medical call incidents text classification under the presence of dataset shifts. Comput Biol Med. Pergamon; 2024; doi: 10.1016/j.compbiomed.2024.108548

work page doi:10.1016/j.compbiomed.2024.108548 2024

[22] [24]

Towards an Analytical System for Supervising Fairness, Robustness, and Dataset Shifts in Health AI

Sánchez-García Á, Fernández-Narro D, Ferri P, García-Gómez JM, Sáez C. Towards an Analytical System for Supervising Fairness, Robustness, and Dataset Shifts in Health AI. Stud Health Technol Inform. IOS Press; 2025; doi: 10.3233/SHTI251537

work page doi:10.3233/shti251537 2025

[23] [25]

Multisource Coherence Analysis of the First European Multicenter Cohort Study for Cancer Prevention in People Experiencing Homelessness: Data Quality Study

Blasco-Calafat A, Blanes-Selva V, Fragner T, Doñate-Martínez A, Alhambra-Borrás T, Gawronska J, et al.. Multisource Coherence Analysis of the First European Multicenter Cohort Study for Cancer Prevention in People Experiencing Homelessness: Data Quality Study. JMIR Med Inform. JMIR Medical Informatics; 2025; doi: 10.2196/73596

work page doi:10.2196/73596 2025

[24] [26]

Open-Source Drift Detection Tools in Action: Insights from Two Use Cases Davor Stjelja

Müller R, Abdelaal M, Oy Helsinki G, DavorStjelja F. Open-Source Drift Detection Tools in Action: Insights from Two Use Cases Davor Stjelja. Proceedings of ACM Conference (Conference’17). 2024; doi: 10.1145/nnnnnnn.nnnnnnn

work page doi:10.1145/nnnnnnn.nnnnnnn 2024