Recognition: unknown
NEURON: A Neuro-symbolic System for Grounded Clinical Explainability
Pith reviewed 2026-05-09 15:22 UTC · model grok-4.3
The pith
A neuro-symbolic system combines medical ontology with machine learning and language models to raise prediction accuracy while generating natural-language clinical explanations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NEURON integrates SNOMED CT ontology-informed structural representations with machine learning models to enhance predictive reliability. A Retrieval-Augmented Generation grounded LLM layer then synthesizes SHAP feature attributions and patient-specific clinical notes into coherent natural-language explanations, producing AUC scores of 0.84-0.88 on MIMIC-IV acute heart failure mortality prediction while scoring 0.85 on human-aligned metrics compared with 0.50 for raw SHAP visualizations.
What carries the argument
The neuro-symbolic pipeline that first maps raw clinical data to SNOMED CT ontology structures, runs them through an ML predictor, and finally uses RAG-augmented LLM synthesis to turn SHAP attributions plus notes into grounded natural-language text.
If this is right
- Prediction AUC for acute heart failure mortality rises from the 0.74-0.77 range to 0.84-0.88.
- Explanations align with human judgment at 0.85 rather than 0.50 for unprocessed SHAP visualizations.
- The same architecture supplies both higher accuracy and narrative transparency for connected health applications.
- Ontology grounding bridges raw numerical features and standard medical nomenclature.
- The system scales as a deployable engineering solution for trustworthy clinical AI.
Where Pith is reading between the lines
- The same ontology-plus-RAG pattern could be tested on other high-stakes prediction tasks such as sepsis or readmission risk where note quality varies.
- If the explanations remain faithful under conflicting notes, the method might reduce the need for separate post-hoc explanation tools in electronic health record systems.
- Clinicians might adopt the outputs more readily because the language is tied to SNOMED terms they already use in documentation.
- Future work could measure downstream effects on clinical decision time or error rates once the explanations are shown alongside the numeric risk score.
Load-bearing premise
The RAG-grounded LLM layer will continue to produce clinically accurate and faithful explanations even when the retrieved documents or patient notes contain incomplete or conflicting information.
What would settle it
A test set of heart failure cases in which patient notes contain directly contradictory statements about key risk factors, followed by blinded clinician ratings of whether the generated explanations correctly resolve or ignore the contradictions.
Figures
read the original abstract
Clinical AI adoption is hindered by the black-box/grey-box nature of high-performing models, which lack the ontological grounding and narrative transparency required for professional-level explainability. We present NEURON, a neuro-symbolic system designed to enhance both predictive reliability and clinical interpretability. NEURON integrates SNOMED CT ontology-informed structural representations with machine learning models to bridge the gap between raw data and medical nomenclature. To facilitate human-aligned interaction, the system utilizes a Retrieval-Augmented Generation (RAG) grounded LLM layer to synthesize SHAP feature attributions and patient-specific clinical notes into coherent, natural-language explanations. Validated on the MIMIC-IV dataset for Acute Heart Failure mortality prediction, NEURON improved the AUC from 0.74-0.77 to 0.84-0.88 and significantly outperformed raw SHAP visualizations in human-aligned metrics (0.85 vs. 0.50). Our results demonstrate that NEURON offers a robust, scalable engineering solution for deploying trustworthy, human-centered connected health applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces NEURON, a neuro-symbolic system that integrates SNOMED CT ontology-informed structural representations with machine learning models and employs a Retrieval-Augmented Generation (RAG) grounded LLM layer to synthesize SHAP feature attributions and patient-specific clinical notes into coherent natural-language explanations. Validated on the MIMIC-IV dataset for acute heart failure mortality prediction, it reports AUC improvements from 0.74-0.77 to 0.84-0.88 and superior performance on human-aligned metrics (0.85 vs. 0.50) compared to raw SHAP visualizations.
Significance. If the reported gains hold under rigorous experimental controls, the work could meaningfully advance trustworthy clinical AI by bridging predictive performance with ontology-grounded, human-readable explanations, addressing a key barrier to adoption in connected health applications. The neuro-symbolic pipeline combining symbolic ontology, SHAP attributions, and RAG-LLM generation offers a concrete engineering approach worth further development.
major comments (3)
- [Results] Results section: The AUC lift from 0.74-0.77 to 0.84-0.88 is presented without any description of train-test split methodology, hyperparameter search, cross-validation procedure, or statistical significance testing, which are required to substantiate the central performance claim.
- [Methods] Methods (RAG-LLM component): No mechanisms are described for conflict detection, source attribution, or post-generation verification of the LLM output against the underlying predictive model when retrieved SNOMED documents or MIMIC notes contain gaps or contradictions, undermining the claim of faithful grounded explainability.
- [Human Evaluation] Human evaluation: The reported superiority in human-aligned metrics (0.85 vs. 0.50) provides no information on blinding, randomization of presentation order, or controls for evaluator bias, leaving open the possibility that scores reflect presentation artifacts rather than true explanation quality.
minor comments (2)
- [Abstract] Abstract: The baseline and improved AUC ranges (0.74-0.77 and 0.84-0.88) would benefit from explicit mapping to the specific baseline models or conditions that produced each endpoint.
- Consider adding a dedicated limitations paragraph addressing potential failure modes of the RAG layer on incomplete EHR data.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which has helped us strengthen the reporting of our experimental and evaluation protocols. We address each major comment point by point below, with revisions incorporated where the original manuscript was incomplete.
read point-by-point responses
-
Referee: [Results] Results section: The AUC lift from 0.74-0.77 to 0.84-0.88 is presented without any description of train-test split methodology, hyperparameter search, cross-validation procedure, or statistical significance testing, which are required to substantiate the central performance claim.
Authors: We agree that these methodological details are necessary to substantiate the performance claims. The original manuscript omitted a full description of the protocol. In the revised version we have added a dedicated Experimental Setup subsection describing a patient-level stratified 70/30 train-test split (to prevent leakage), hyperparameter optimization via 5-fold cross-validation combined with grid search, and statistical significance assessed via bootstrap resampling (1,000 iterations) with paired tests confirming p < 0.01 for the AUC gains. These additions are now in Section 4.1. revision: yes
-
Referee: [Methods] Methods (RAG-LLM component): No mechanisms are described for conflict detection, source attribution, or post-generation verification of the LLM output against the underlying predictive model when retrieved SNOMED documents or MIMIC notes contain gaps or contradictions, undermining the claim of faithful grounded explainability.
Authors: We concur that explicit safeguards are required for trustworthy grounded explanations. The initial submission did not detail these components. We have expanded the RAG-LLM Methods subsection to describe (i) a conflict-detection step that computes semantic similarity between SNOMED CT concepts and SHAP attributions and flags low-consistency cases, (ii) mandatory source attribution that cites specific retrieved document IDs and note excerpts in every generated explanation, and (iii) a post-generation verification pass that re-queries the predictive model to ensure the narrative remains consistent with the original feature attributions and output probabilities. These mechanisms are now fully specified. revision: yes
-
Referee: [Human Evaluation] Human evaluation: The reported superiority in human-aligned metrics (0.85 vs. 0.50) provides no information on blinding, randomization of presentation order, or controls for evaluator bias, leaving open the possibility that scores reflect presentation artifacts rather than true explanation quality.
Authors: The referee correctly notes that the human-evaluation protocol was under-reported. We have revised the Human Evaluation section to specify that the study was conducted with double-blinding (evaluators unaware of explanation source), randomized presentation order across the 50 participating clinicians, and explicit bias controls including a pilot inter-rater reliability assessment (Krippendorff’s alpha = 0.81) and exclusion of any evaluator with prior model exposure. These details support the validity of the reported metric improvement. revision: yes
Circularity Check
No circularity: empirical results on external dataset with no derivations or self-referential fits
full rationale
The paper presents NEURON as a neuro-symbolic pipeline combining SNOMED CT ontology, ML models, SHAP attributions, and a RAG-grounded LLM layer. All reported performance numbers (AUC lift from 0.74-0.77 to 0.84-0.88 and human-aligned metrics 0.85 vs 0.50) are stated as outcomes of validation experiments on the external public MIMIC-IV dataset for acute heart failure mortality prediction. No equations, first-principles derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims are therefore not equivalent to their inputs by construction; they rest on independent empirical measurement rather than definitional or self-referential reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption SNOMED CT provides a reliable, complete structural representation of clinical concepts and relationships for feature grounding.
Reference graph
Works this paper leans on
-
[1]
Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI).IEEE Access6 (2018), 52138–52160. doi:10.1109/ACCESS.2018.2870052
-
[2]
Sara Alhasan and Reem Alnanih. 2025. Enhancing AI explainability through the EXACT framework: A user-centric approach.IEEE Access13 (2025), 98208–98228. doi:10.1109/ACCESS.2025.3576234
-
[3]
Sajid Ali, Tamer Abuhmed, Shaker El-Sappagh, Khan Muhammad, Jose M Alonso-Moral, Roberto Confalonieri, Riccardo Guidotti, Javier Del Ser, Natalia Díaz-Rodríguez, and Francisco Herrera. 2023. Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence.Information Fusion99 (2023), 101805. doi:10.101...
-
[4]
Shuroug A Alowais, Sahar S Alghamdi, Nada Alsuhebany, Tariq Alqahtani, Abdulrahman I Alshaya, Sumaya N Almohareb, Atheer Aldairem, Mohammed Alrashed, Khalid Bin Saleh, Hisham A Badreldin, et al. 2023. Revolutionizing healthcare: the role of artificial intelligence in clinical practice.BMC Medical Education23, 1 (2023), 689. doi:10.1186/s12909-023-04698-z
-
[5]
David Alvarez-Melis and Tommi S Jaakkola. 2018. On the robustness of interpretability methods.arXiv:1806.08049(2018)
work page Pith review arXiv 2018
-
[6]
Julia Amann, Alessandro Blasimme, Effy Vayena, Dietmar Frey, Vince I Madai, and Precise4Q Consortium. 2020. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective.BMC Medical Informatics and Decision Making20, 1 (2020), 310. doi:10.1186/s12911-020- 01332-6
-
[7]
Lars W Andersen, Won Young Kim, Maureen Chase, Sharri J Mortensen, Ari Moskowitz, Victor Novack, Michael N Cocchi, Michael W Donnino, et al. 2016. The prevalence and significance of abnormal vital signs prior to in-hospital cardiac arrest.Resuscitation98 (2016), 112–117. doi:10.1016/j.resuscitation.2015.08.016
-
[8]
Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, et al . 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI.Information Fusion58 (2020), 82–115. doi:10....
-
[9]
Sujatha Baddam and Bracken Burns. 2025. Systemic inflammatory response syndrome. InStatPearls [Internet]. StatPearls Publishing, Treasure Island, FL, USA. https://www.ncbi.nlm.nih.gov/books/NBK547669/
2025
-
[10]
Sher Badshah and Hassan Sajjad. 2025. Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form QA. InProceedings of the 9th Widening NLP Workshop. 251–267. doi:10.18653/v1/2025.winlp-main.37
-
[11]
TT Bauer, S Ewig, R Marre, N Suttorp, T Welte, and CAPNETZ Study Group. 2006. CRB-65 predicts death from community-acquired pneumonia. Journal of Internal Medicine260, 1 (2006), 93–101. doi:10.1111/j.1365-2796.2006.01657.x
-
[12]
Wobo Bekwelem, Pamela L Lutsey, Laura R Loehr, Sunil K Agarwal, Brad C Astor, Cameron Guild, Christie M Ballantyne, and Aaron R Folsom
-
[13]
doi:10.1016/j.annepidem.2011.06.005
White blood cell count, C-reactive protein, and incident heart failure in the Atherosclerosis Risk in Communities (ARIC) Study.Annals of Epidemiology21, 10 (2011), 739–748. doi:10.1016/j.annepidem.2011.06.005
-
[14]
Subrato Bharati, M Rubaiyat Hossain Mondal, and Prajoy Podder. 2023. A review on explainable artificial intelligence for healthcare: Why, how, and when?IEEE Transactions on Artificial Intelligence5, 4 (2023), 1429–1442. doi:10.1109/TAI.2023.3266418
-
[15]
Mikołaj Błaziak, Szymon Urban, Weronika Wietrzyk, Maksym Jura, Gracjan Iwanek, Bartłomiej Stańczykiewicz, Wiktor Kuliczkowski, Robert Zymliński, Maciej Pondel, Petr Berka, et al. 2022. An artificial intelligence approach to guiding the management of heart failure patients using predictive models: A systematic review.Biomedicines10, 9 (2022), 2188. doi:10....
-
[16]
Michael Böhm, Karl Swedberg, Michel Komajda, Jeffrey S Borer, Ian Ford, Ariane Dubost-Brama, Guy Lerebours, and Luigi Tavazzi. 2010. Heart rate as a risk factor in chronic heart failure (SHIFT): the association between heart rate and outcomes in a randomised placebo-controlled trial.The 16 Anuradha Chandrasekaran, Dimitrios Zikos, Mutlu Mete, Alan Pang, B...
-
[17]
Giovanni Briganti and Olivier Le Moine. 2020. Artificial intelligence in medicine: Today and tomorrow.Frontiers in Medicine7 (2020), 509744. doi:10.3389/fmed.2020.00027
-
[18]
Simone Britsch, Markward Britsch, Simon Lindner, Leonie Hahn, Verena Schneider-Lindner, Thomas Helbing, Manfred Thiel, Daniel Duerschmied, and Tobias Becher. 2025. An interpretable machine learning algorithm enables dynamic 48-hour mortality prediction during an ICU stay. Communications Medicine5, 1 (2025), 426. doi:10.1038/s43856-025-01192-z
-
[19]
Federico Cabitza, Raffaele Rasoini, and Gian Franco Gensini. 2017. Unintended consequences of machine learning in medicine.JAMA318, 6 (2017), 517–518. doi:10.1001/jama.2017.7797
-
[20]
Zijun Chen, Tingming Li, Sheng Guo, Deli Zeng, and Kai Wang. 2023. Machine learning-based in-hospital mortality risk prediction tool for intensive care unit patients with heart failure.Frontiers in Cardiovascular Medicine10 (2023), 1119699. doi:10.3389/fcvm.2023.1119699
-
[21]
Komala Subramanyam Cherukuri and Haihua Chen. 2024. Investigating the applications of AI in oncology from NIH funded projects: Case study of lung cancer and pancreatic cancer.Proceedings of the Association for Information Science and Technology61, 1 (2024), 871–873. doi:10.1002/pra2.1124
-
[22]
Ovidiu Chioncel, Sean P Collins, Andrew P Ambrosy, Mihai Gheorghiade, and Gerasimos Filippatos. 2015. Pulmonary oedema—therapeutic targets. Cardiac Failure Review1, 1 (2015), 38. doi:10.15420/CFR.2015.01.01.38
-
[23]
Carlo Combi, Beatrice Amico, Riccardo Bellazzi, Andreas Holzinger, Jason H Moore, Marinka Zitnik, and John H Holmes. 2022. A manifesto on explainability for artificial intelligence in medicine.Artificial Intelligence in Medicine133 (2022), 102423. doi:10.1016/j.artmed.2022.102423
-
[24]
Roberto Confalonieri and Giancarlo Guizzardi. 2025. On the multiple roles of ontologies in explanations for neuro-symbolic AI.Neurosymbolic Artificial Intelligence1 (2025), NAI–240754. doi:10.3233/NAI-240754
-
[25]
Daniel De Backer, Patrick Biston, Jacques Devriendt, Christian Madl, Didier Chochrad, Cesar Aldecoa, Alexandre Brasseur, Pierre Defrance, Philippe Gottignies, and Jean-Louis Vincent. 2010. Comparison of dopamine and norepinephrine in the treatment of shock.New England Journal of Medicine362, 9 (2010), 779–789. doi:10.1056/NEJMoa0907118
-
[26]
Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. 2020. ERASER: A benchmark to evaluate rationalized NLP models. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4443–4458. doi:10.18653/v1/2020.acl-main.408
-
[27]
Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning.arXiv:1702.08608(2017)
work page internal anchor Pith review arXiv 2017
-
[28]
Rudresh Dwivedi, Devam Dave, Het Naik, Smiti Singhal, Rana Omer, Pankesh Patel, Bin Qian, Zhenyu Wen, Tejal Shah, Graham Morgan, et al
-
[29]
Explainable AI (XAI): Core ideas, techniques, and solutions.Comput. Surveys55, 9 (2023), 1–33. doi:10.1145/3561048
-
[30]
Hui Feng, Yuntzu Yin, Emiliano Reynares, and Jay Nanavati. 2025. OntologyRAG: Better and faster biomedical code mapping with retrieval- augmented generation (RAG) leveraging ontology knowledge graphs and large language models. InInternational Workshop on Knowledge-Enhanced Information Retrieval. Springer, 71–86. doi:10.1007/978-3-032-02899-0_4
-
[31]
Gregg C Fonarow, Kirkwood F Adams, William T Abraham, Clyde W Yancy, W John Boscardin, ADHERE Scientific Advisory Committee, et al
-
[32]
Risk stratification for in-hospital mortality in acutely decompensated heart failure: classification and regression tree analysis.JAMA293, 5 (2005), 572–580. doi:10.1001/jama.293.5.572
-
[33]
Artur d’Avila Garcez and Luis C Lamb. 2023. Neurosymbolic AI: the 3rd wave.Artificial Intelligence Review56, 11 (2023), 12387–12406. doi:10.1007/s10462-023-10448-w
-
[34]
Manas Gaur and Amit Sheth. 2024. Building trustworthy NeuroSymbolic AI Systems: Consistency, reliability, explainability, and safety.AI Magazine45, 1 (2024), 139–155. doi:10.1002/aaai.12149
-
[35]
Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals.Circulation101, 23 (2000), e215–e220. doi:10.1161/01.CIR.101.23.e215
-
[36]
Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al. 2024. A survey on LLM-as-a-judge.The Innovation(2024). doi:10.1016/j.xinn.2025.101253
-
[37]
Rosita Guido, Stefania Ferrisi, Danilo Lofaro, and Domenico Conforti. 2024. An overview on the advancements of support vector machine models in healthcare applications: A review.Information15, 4 (2024), 235
2024
-
[38]
Hajar Hakkoum, Ali Idri, and Ibtissam Abnane. 2024. Global and local interpretability techniques of supervised machine learning black box models for numerical medical data.Engineering Applications of Artificial Intelligence131 (2024), 107829. doi:10.1016/j.engappai.2023.107829
- [39]
-
[40]
2002.Accuracy and stability of numerical algorithms
Nicholas J Higham. 2002.Accuracy and stability of numerical algorithms. SIAM
2002
-
[41]
Robert R Hoffman, Mohammadreza Jalaeian, Connor Tate, Gary Klein, and Shane T Mueller. 2023. Evaluating machine-generated explanations: a “Scorecard” method for XAI measurement science.Frontiers in Computer Science5 (2023), 1114806. doi:10.3389/fcomp.2023.1114806
-
[42]
Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo Müller. 2019. Causability and explainability of artificial intelligence in medicine.WIREs Data Mining and Knowledge Discovery9, 4 (2019), e1312. doi:10.1002/widm.1312
- [43]
-
[44]
Jicheng Huang, Yufeng Cai, Xusheng Wu, Xin Huang, Jianwei Liu, and Dehua Hu. 2024. Prediction of mortality events of patients with acute heart failure in intensive care unit based on deep neural network.Computer Methods and Programs in Biomedicine256 (2024), 108403. NEURON: A Neuro-symbolic System for Grounded Clinical Explainability 17 doi:10.1016/j.cmpb...
-
[45]
Shaoyan Huang, Qiuwang Zhang, Lei Liu, Michael JB Kutryk, and Jianzhong Zhang. 2025. Relationship between albumin-corrected anion gap and short-and medium-term all-cause mortality in heart failure patients with a single ICU admission.Frontiers in Cardiovascular Medicine12 (2025), 1608383. doi:10.3389/fcvm.2025.1608383
-
[46]
Sujeong Hur, Yura Lee, Joongheum Park, Yeong Jeong Jeon, Jong Ho Cho, Duck Cho, Dobin Lim, Wonil Hwang, Won Chul Cha, and Junsang Yoo
-
[47]
doi:10.1038/s41746-025-01958-8
Comparison of SHAP and clinician friendly explanations reveals effects on clinical decision behaviour.NPJ Digital Medicine8, 1 (2025), 578. doi:10.1038/s41746-025-01958-8
-
[48]
Alon Jacovi and Yoav Goldberg. 2020. Towards faithfully interpretable NLP systems: How should we define and evaluate faithfulness?. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics. 4198–4205. doi:10.18653/v1/2020.acl-main.386
-
[49]
Pengcheng Jiang, Cao (Danica) Xiao, Minhao Jiang, Parminder Bhatia, Taha Kass-Hout, Jimeng Sun, and Jiawei Han. 2025. Reasoning-enhanced healthcare predictions with knowledge graph community retrieval. InProceedings of the Thirteenth International Conference on Learning Represen- tations (ICLR)
2025
-
[50]
Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. 2024. MIMIC-IV. PhysioNet. doi:10.13026/kpb9-mt58 Version 3.1
-
[51]
Alistair EW Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. 2023. MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data10, 1 (2023), 1. doi:10.1038/s41597-022-01899-x
-
[52]
Md Abdul Kadir, Amir Mosavi, and Daniel Sonntag. 2023. Evaluation metrics for XAI: A review, taxonomy, and practical applications. In2023 IEEE 27th International Conference on Intelligent Engineering Systems (INES). IEEE, 000111–000124. doi:10.1109/INES59282.2023.10297629
-
[53]
Harmanpreet Kaur, Eytan Adar, Eric Gilbert, and Cliff Lampe. 2022. Sensible AI: Re-imagining interpretability and explainability using sensemaking theory. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 702–714. doi:10.1145/3531146.3533135
-
[54]
Gwladys Kelodjou, Laurence Rozé, Véronique Masson, Luis Galárraga, Romaric Gaudel, Maurice Tchuente, and Alexandre Termier. 2024. Shaping up SHAP: enhancing stability through layer-wise neighbor selection. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 13094–13103. doi:10.1609/aaai.v38i12.29208
-
[55]
Arif Khwaja. 2012. KDIGO clinical practice guidelines for acute kidney injury.Nephron Clinical Practice120, 4 (2012), c179–c184. doi:10.1159/ 000339789
2012
-
[56]
Sujung Lee, Won Ik Cho, Youngrong Lee, Duck Ju Kim, Kyeng Hyun Nam, Sangmin Lee, Jungyo Suh, and Taehoon Ko. 2025. A prompt framework for enhancing LLM-based explainability of medical machine learning models: an intensive care unit application.BMC Medical Informatics and Decision Making25, 1 (2025), 430. doi:10.1186/s12911-025-03239-6
-
[57]
Marcel Levi and Steven M Opal. 2006. Coagulation abnormalities in critically ill patients.Critical Care10, 4 (2006), 222. doi:10.1186/cc4975
-
[58]
Avivit Levy, B Riva Shalom, and Michal Chalamish. 2025. A guide to similarity measures and their data science applications.Journal of Big Data 12, 1 (2025), 188. doi:10.1186/s40537-025-01227-1
-
[59]
Jun Li, Yiwu Sun, Jie Ren, Yifan Wu, and Zhaoyi He. 2025. Machine learning for in-hospital mortality prediction in critically ill patients with acute heart failure: A retrospective analysis based on the MIMIC-IV database.Journal of Cardiothoracic and Vascular Anesthesia39, 3 (2025), 666–674. doi:10.1053/j.jvca.2024.12.016
-
[60]
Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, et al. 2024. Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions. Information Fusion106 (2024), 102301. doi:10.1016/j.inff...
-
[61]
Zeying Lou, Fanghua Zeng, Wenbao Huang, Li Xiao, Kang Zou, and Huasheng Zhou. 2024. Association between the anion-gap and 28-day mortality in critically ill adult patients with sepsis: A retrospective cohort study.Medicine103, 30 (2024), e39029. doi:10.1097/MD.0000000000039029
- [62]
-
[63]
Lundberg and Su-In Lee
Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. InProceedings of the 31st International Conference on Neural Information Processing Systems (NeurIPS). 4768–4777
2017
-
[64]
Alexandre Mebazaa, Markku S Nieminen, Milton Packer, Alain Cohen-Solal, Franz X Kleber, Stuart J Pocock, Roopal Thakkar, Robert J Padley, Pentti Põder, Matti Kivikko, et al. 2007. Levosimendan vs dobutamine for patients with acute decompensated heart failure: the SURVIVE Randomized Trial.JAMA297, 17 (2007), 1883–1891. doi:10.1001/jama.297.17.1883
-
[65]
Munib Mesinovic, Peter Watkinson, and Tingting Zhu. 2025. Explainable machine learning for predicting ICU mortality in myocardial infarction patients using pseudo-dynamic data.Scientific Reports15, 1 (2025), 27887. doi:10.1038/s41598-025-13299-3
-
[66]
Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. 2016. Deep patient: An unsupervised representation to predict the future of patients from the electronic health records.Scientific Reports6, 1 (2016), 26094. doi:10.1038/srep26094
-
[67]
Kayo Misumi, Yuya Matsue, Kazutaka Nogi, Yudai Fujimoto, Nobuyuki Kagiyama, Takatoshi Kasai, Takeshi Kitai, Shogo Oishi, Eiichi Akiyama, Satoshi Suzuki, et al. 2023. Derivation and validation of a machine learning-based risk prediction model in patients with acute heart failure. Journal of Cardiology81, 6 (2023), 531–536. doi:10.1016/j.jjcc.2023.02.006
-
[68]
Sina Mohseni, Jeremy E Block, and Eric Ragan. 2021. Quantitative evaluation of machine learning explanations: A human-grounded benchmark. InProceedings of the 26th International Conference on Intelligent User Interfaces. 22–31. doi:10.1145/3397481.3450689
-
[69]
2020.Interpretable machine learning
Christoph Molnar. 2020.Interpretable machine learning. Lulu.com. 18 Anuradha Chandrasekaran, Dimitrios Zikos, Mutlu Mete, Alan Pang, Brady D. Lund, and Kewei Sha
2020
-
[70]
Ramesh Nadarajah, Tanina Younsi, Elizabeth Romer, Keerthenan Raveendra, Yoko M Nakao, Kazuhiro Nakao, Farag Shuweidhi, David C Hogg, Ronen Arbel, Doron Zahger, et al. 2023. Prediction models for heart failure in the community: A systematic review and meta-analysis.European Journal of Heart Failure25, 10 (2023), 1724–1738. doi:10.1002/ejhf.2970
-
[71]
National Library of Medicine. 2025. SNOMED CT United States Edition. https://www.nlm.nih.gov/healthit/snomedct/index.html Accessed: Dec. 8, 2025
2025
-
[72]
Meike Nauta, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yasmin Schmitt, Jörg Schlötterer, Maurice Van Keulen, and Christin Seifert. 2023. From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable AI.Comput. Surveys55, 13s (2023), 1–42. doi:10.1145/3583558
-
[73]
Sidra Naveed, Gunnar Stevens, and Dean Robin-Kern. 2024. An overview of the empirical evaluation of explainable AI (XAI): A comprehensive guideline for user-centered evaluation in XAI.Applied Sciences14, 23 (2024), 11288. doi:10.3390/app142311288
-
[74]
Fnu Neha, Deepshikha Bhati, and Deepak Kumar Shukla. 2025. Retrieval-augmented generation (RAG) in healthcare: A comprehensive review.AI 6, 9 (2025), 226. doi:10.3390/ai6090226
-
[75]
Ke Niu, You Lu, Xueping Peng, and Jingni Zeng. 2022. Fusion of sequential visits and medical ontology for mortality prediction.Journal of Biomedical Informatics127 (2022), 104012. doi:10.1016/j.jbi.2022.104012
-
[76]
Sarah Nogueira, Konstantinos Sechidis, and Gavin Brown. 2018. On the stability of feature selection algorithms.Journal of Machine Learning Research18, 174 (2018), 1–54
2018
-
[77]
Atsushi Ohgushi, Takayuki Ohtani, Natsumi Nakayama, Shigeo Asai, Yoshiyuki Ishii, Atsuo Namiki, Manabu Akazawa, and Hirotoshi Echizen
-
[78]
Risk of major bleeding at different PT-INR ranges in elderly Japanese patients with non-valvular atrial fibrillation receiving warfarin: a nested case-control study.Journal of Pharmaceutical Health Care and Sciences2, 1 (2016), 2. doi:10.1186/s40780-015-0036-1
-
[79]
Evandro S Ortigossa, Thales Gonçalves, and Luis Gustavo Nonato. 2024. Explainable artificial intelligence (XAI)—From theory to methods and applications.IEEE Access12 (2024), 80799–80846
2024
-
[80]
Ciyuan Peng, Feng Xia, Mehdi Naseriparsa, and Francesco Osborne. 2023. Knowledge graphs: Opportunities and challenges.Artificial Intelligence Review56, 11 (2023), 13071–13102. doi:10.1007/s10462-023-10465-9
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.