Recognition: unknown
mmid: Multi-Modal Integration and Downstream analyses for healthcare analytics in Python
Pith reviewed 2026-05-10 17:08 UTC · model grok-4.3
The pith
A Python package fuses imaging, electrical, and genetic heart data to identify cardiovascular disease earlier and more accurately than single sources.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
mmid is a Python package offering multi-modal fusion and imputation along with classification, time-to-event prediction, and clustering under one interface. In the showcase with cardiac magnetic resonance imaging, electrocardiogram, and polygenic risk scores from the UK Biobank, the modalities provided joint and individual information that supported early cardiovascular disease identification before clinical manifestations and with greater effectiveness than any single modality. The package further enabled imputation of partially observed modalities while maintaining downstream prediction performance.
What carries the argument
The mmid Python package, which integrates multiple algorithms for multi-modal data decomposition, imputation, prediction, and clustering through a single command interface and configuration files.
If this is right
- The combined modalities capture both shared patterns and unique details that aid in early cardiovascular disease detection.
- The multi-modal approach yields better disease prediction results than using cardiac MRI, ECG, or polygenic risk scores separately.
- Imputation of missing data modalities incurs no substantial reduction in prediction accuracy for cardiovascular outcomes.
- The package structure promotes reproducibility by allowing analyses to be run via configuration files.
- Downstream tasks such as time-to-event analysis and clustering become straightforward within the same multi-modal framework.
Where Pith is reading between the lines
- The imputation capability could extend the usable sample size in studies where not all participants have complete multi-modal measurements.
- Similar integration methods might apply to predicting other conditions with available imaging, signal, and genetic data.
- Testing the package on datasets from different populations would check if the performance benefits generalize.
Load-bearing premise
The specific algorithms for fusion, imputation, and prediction in the package correctly identify and combine the meaningful information from the three data types without being overly tuned to this one dataset.
What would settle it
A test on new data from a different group of people where the multi-modal predictions show no advantage over the strongest individual data source or where imputed values lead to clearly worse predictions.
Figures
read the original abstract
mmid (Multi-Modal Integration and Downstream analyses for healthcare analytics) is a Python package that offers multi-modal fusion and imputation, classification, time-to-event prediction and clustering functionalities under a single interface, filling the gap of sequential data integration and downstream analyses for healthcare applications in a structured and flexible environment. mmid wraps in a unique package several algorithms for multi-modal decomposition, prediction and clustering, which can be combined smoothly with a single command and proper configuration files, thus facilitating reproducibility and transferability of studies involving heterogeneous health data sources. A showcase on personalised cardiovascular risk prediction is used to highlight the relevance of a composite pipeline enabling proper treatment and analysis of complex multi-modal data. We thus employed mmid in an example real application scenario involving cardiac magnetic resonance imaging, electrocardiogram, and polygenic risk scores data from the UK Biobank. We proved that the three modalities captured joint and individual information that was used to (1) early identify cardiovascular disease before clinical manifestations with cardiological relevance, and (2) do it better than single data sources alone. Moreover, mmid allowed to impute partially observable data modalities without considerable performance losses in downstream disease prediction, thus proving its relevance for real-world health analytics applications (which are often characterised by the presence of missing data).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces mmid, a Python package providing a unified interface for multi-modal fusion, imputation, classification, time-to-event prediction, and clustering in healthcare analytics. It wraps existing algorithms for these tasks and uses configuration files to promote reproducibility. The central demonstration is a UK Biobank showcase combining cardiac MRI, ECG, and polygenic risk scores, asserting that the modalities extract joint and individual information enabling early cardiovascular disease identification before clinical manifestations, with superior performance to single modalities, and that imputation of missing modalities incurs no considerable loss in downstream prediction accuracy.
Significance. If the empirical claims are substantiated, the package would address a practical gap by offering an integrated, reproducible workflow for heterogeneous health data, particularly useful for handling missing modalities in real-world cohorts. The showcase suggests utility for pre-symptomatic CVD risk stratification using imaging, electrophysiological, and genetic sources, which could aid transferability of multi-modal studies.
major comments (2)
- [Abstract] Abstract: The assertions that the three modalities 'captured joint and individual information' to 'early identify cardiovascular disease before clinical manifestations with cardiological relevance' and 'do it better than single data sources alone' are presented without any quantitative metrics (e.g., AUC, hazard ratios, p-values), baseline comparisons, error bars, or details on the specific fusion/imputation algorithms and validation procedures. This is load-bearing for the central empirical claim of superiority and effective imputation.
- [Showcase section] Showcase/results description: The manuscript frames the UK Biobank application as proving the package's relevance but supplies no tables or figures with performance numbers for multi-modal vs. single-modality models, no description of how joint/individual components were extracted or validated, and no assessment of potential dataset-specific biases or overfitting in the chosen cohort.
minor comments (2)
- [Abstract] The abstract uses 'we proved' for empirical results; consider rephrasing to 'we demonstrate' or 'we show' to reflect the illustrative nature of the showcase.
- [Methods] Ensure the full manuscript includes a dedicated methods subsection detailing the wrapped algorithms, configuration options, and exact pipeline steps used in the showcase for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript introducing the mmid package. The comments correctly identify areas where the empirical claims require stronger quantitative support and clearer presentation of methods and results. We address each major comment below and commit to revisions that will substantiate the key findings without altering the core contributions of the work.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertions that the three modalities 'captured joint and individual information' to 'early identify cardiovascular disease before clinical manifestations with cardiological relevance' and 'do it better than single data sources alone' are presented without any quantitative metrics (e.g., AUC, hazard ratios, p-values), baseline comparisons, error bars, or details on the specific fusion/imputation algorithms and validation procedures. This is load-bearing for the central empirical claim of superiority and effective imputation.
Authors: We agree that the abstract would be strengthened by including key quantitative results. In the revised manuscript, we will add concise statements of the main performance metrics from the UK Biobank showcase, including AUC values and improvements for the multi-modal model over single-modality baselines, hazard ratios for the time-to-event analysis, and associated p-values. We will also specify the primary fusion and imputation algorithms employed (e.g., the multi-modal decomposition and imputation routines wrapped by mmid) and note the cross-validation procedure used. These additions will be kept brief to respect abstract length constraints while directly addressing the load-bearing claims. revision: yes
-
Referee: [Showcase section] Showcase/results description: The manuscript frames the UK Biobank application as proving the package's relevance but supplies no tables or figures with performance numbers for multi-modal vs. single-modality models, no description of how joint/individual components were extracted or validated, and no assessment of potential dataset-specific biases or overfitting in the chosen cohort.
Authors: The current showcase section is intentionally high-level to demonstrate package usage rather than serve as a full results paper. We acknowledge that this leaves the empirical claims under-supported in the text. We will expand the section to include a summary table of performance metrics comparing multi-modal fusion against single-modality baselines (AUC, concordance index, etc.), with error bars or confidence intervals where applicable. We will describe the extraction of joint and individual components via the specific mmid decomposition functions and their validation through hold-out testing. Finally, we will add a brief discussion of UK Biobank cohort characteristics, potential selection biases, and mitigation steps such as stratified cross-validation to address overfitting concerns. These changes will be integrated into the results and discussion sections. revision: yes
Circularity Check
No significant circularity
full rationale
The paper is a software package description (mmid) that wraps existing multi-modal decomposition, imputation, prediction and clustering algorithms, then demonstrates them empirically on external UK Biobank cardiac MRI, ECG and polygenic risk score data. No mathematical derivations, equations, fitted parameters or theoretical claims are presented whose outputs reduce by construction to the inputs. All performance results are obtained from held-out or external data splits rather than self-defined quantities, and no load-bearing self-citations or ansatzes are invoked to justify core results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Multimodal Integration in Health Care: Development With Applications in Disease Management
Yan Hao et al. Multimodal Integration in Health Care: Development With Applications in Disease Management. J Med Internet Res, 27:e76557, 2025
2025
-
[2]
Multimodal Learning for Multi-omics: A Survey.World Scientific Annual Review of Artificial Intelligence, 01:2250004, 2023
Sina Tabakhi et al. Multimodal Learning for Multi-omics: A Survey.World Scientific Annual Review of Artificial Intelligence, 01:2250004, 2023
2023
-
[3]
Multi-Omics Factor Analysis—a framework for unsupervised integration of multi- omics data sets.Molecular Systems Biology, 14(6):e8124, 2018
Ricard Argelaguet et al. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi- omics data sets.Molecular Systems Biology, 14(6):e8124, 2018
2018
-
[4]
Joint and Individual variation explained (JIVE) for integrated analysis of multiple data types
Eric Lock et al. Joint and Individual variation explained (JIVE) for integrated analysis of multiple data types. The annals of applied statistics, 7:523–542, 2013
2013
-
[5]
Angle-based joint and individual variation explained.Journal of Multivariate Analysis, 166:241–265, 2018
Qing Feng et al. Angle-based joint and individual variation explained.Journal of Multivariate Analysis, 166:241–265, 2018. 19
2018
-
[6]
Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling.Frontiers in Oncology, 10, 2020
Marco Chierici et al. Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling.Frontiers in Oncology, 10, 2020
2020
-
[7]
MIDAA: deep archetypal analysis for interpretable multi-omic data integration based on biological principles.Genome Biology, 26, 2025
Salvatore Milite et al. MIDAA: deep archetypal analysis for interpretable multi-omic data integration based on biological principles.Genome Biology, 26, 2025
2025
-
[8]
Multi-view learning and omics integration: a unified perspective with applications to healthcare.MOX Reports, 2026
Valeria Iapaolo et al. Multi-view learning and omics integration: a unified perspective with applications to healthcare.MOX Reports, 2026
2026
-
[10]
MUON: multimodal omics analysis framework.Genome Biology, 23:42, 2022
Danila Bredikhin et al. MUON: multimodal omics analysis framework.Genome Biology, 23:42, 2022
2022
-
[11]
mixOmics: An R package for ‘omics feature selection and multiple data integration
Florian Rohart et al. mixOmics: An R package for ‘omics feature selection and multiple data integration. PLOS Computational Biology, 13(11):1–19, 2017
2017
-
[12]
A machine learning approach for multimodal data fusion for survival prediction in cancer patients.npj Precision Oncology, 9:128, 2025
Nikos Nikolaou et al. A machine learning approach for multimodal data fusion for survival prediction in cancer patients.npj Precision Oncology, 9:128, 2025
2025
-
[13]
The role of cardiac magnetic resonance (CMR) in the diagnosis of cardiomy- opathy: A systematic review.Malawi Medical Journal, 30(4):291–295, 2018
Henry Anselmo Mayala et al. The role of cardiac magnetic resonance (CMR) in the diagnosis of cardiomy- opathy: A systematic review.Malawi Medical Journal, 30(4):291–295, 2018
2018
-
[14]
Research on atrial fibrillation diagnosis in electrocardiograms based on CLA-AF model
Jiajia Si et al. Research on atrial fibrillation diagnosis in electrocardiograms based on CLA-AF model. European Heart Journal - Digital Health, 6(1):82–95, 2024
2024
-
[15]
Polygenic risk scores in cardiovascular risk prediction: A cohort study and modelling analyses.PLOS Medicine, 18(1):1–22, 2021
Luanluan Sun et al. Polygenic risk scores in cardiovascular risk prediction: A cohort study and modelling analyses.PLOS Medicine, 18(1):1–22, 2021
2021
-
[16]
MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data.Genome Biology, 21(1):111, 2020
Ricard Argelaguet et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data.Genome Biology, 21(1):111, 2020
2020
-
[17]
mvlearn: Multiview Machine Learning in Python.Journal of Machine Learning Research, 22(109):1–7, 2021
Ronan Perry et al. mvlearn: Multiview Machine Learning in Python.Journal of Machine Learning Research, 22(109):1–7, 2021
2021
-
[18]
Statsmodels: Econometric and Statistical Modeling with Python
Skipper Seabold and Josef Perktold. Statsmodels: Econometric and Statistical Modeling with Python. In Proceedings of the 9th Python in Science Conference, pages 92–96, 2010
2010
-
[19]
Tianqi Chen et al.xgboost: Extreme Gradient Boosting, 2025
2025
-
[20]
Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011
Fabian Pedregosa et al. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011
2011
-
[21]
lifelines: survival analysis in Python.Journal of Open Source Software, 4(40):1317, 2019
Cameron Davidson-Pilon. lifelines: survival analysis in Python.Journal of Open Source Software, 4(40):1317, 2019
2019
-
[22]
scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn.Journal of Machine Learning Research, 21(212):1–6, 2020
Sebastian P¨olsterl. scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn.Journal of Machine Learning Research, 21(212):1–6, 2020
2020
-
[23]
Katzman et al
Jared L. Katzman et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network.BMC Medical Research Methodology, 18(1):24, 2018
2018
-
[24]
PySurvival: Open source package for Survival Analysis modeling, 2019
Stephane Fotso et al. PySurvival: Open source package for Survival Analysis modeling, 2019
2019
-
[25]
DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks
Changhee Lee et al. DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), 2018
2018
-
[26]
Time-to-Event Prediction with Neural Networks and Cox Regression.Journal of Machine Learning Research, 20(129):1–30, 2019
H˚avard Kvamme et al. Time-to-Event Prediction with Neural Networks and Cox Regression.Journal of Machine Learning Research, 20(129):1–30, 2019
2019
-
[27]
Ritchie et al
Scott C. Ritchie et al. Combined clinical, metabolomic, and polygenic scores for cardiovascular risk prediction. European Heart Journal, page ehaf947, 2025
2025
-
[28]
D’Agostino et al
Ralph B. D’Agostino et al. General Cardiovascular Risk Profile for Use in Primary Care.Circulation, 117(6):743–753, 2008
2008
-
[29]
Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study.BMJ, 357, 2017
Julia Hippisley-Cox et al. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study.BMJ, 357, 2017
2017
-
[30]
Family history of cardiovascular disease.Canadian family physician M´edecin de famille canadien, 60(11):1016, 2014
Michael Kolber and Cathy Scrimshaw. Family history of cardiovascular disease.Canadian family physician M´edecin de famille canadien, 60(11):1016, 2014. 20
2014
-
[31]
UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age.PLOS Medicine, 12(3):1–10, 2015
Cathie Sudlow et al. UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age.PLOS Medicine, 12(3):1–10, 2015
2015
-
[32]
A population-based phenome-wide association study of cardiac and aortic structure and function.Nature Medicine, 26(10):1654–1662, 2020
Wenjia Bai et al. A population-based phenome-wide association study of cardiac and aortic structure and function.Nature Medicine, 26(10):1654–1662, 2020
2020
-
[33]
Thompson et al
Deborah J. Thompson et al. A systematic evaluation of the performance and properties of the UK Biobank Polygenic Risk Score (PRS) Release.PLOS ONE, 19(9):1–24, 2024
2024
-
[34]
Scanner-Agnostic MRI Harmonization via SSIM-Guided Disentanglement.arXiv, 2025
Luca Caldera et al. Scanner-Agnostic MRI Harmonization via SSIM-Guided Disentanglement.arXiv, 2025. 21 Supplementary tables Disease subtype Defined as the first among these events Excluding subjects that experienced any of these events before baseline AA ICD-10:I48 ICD-10:I48 CAD ICD-10:I21, I22, I23, I24.1, I25.2 ICD-9:410, 411, 412, 429.79 OPCS-4:K40.1-...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.