pith. sign in

arxiv: 2605.25566 · v1 · pith:QAJ4ZVHTnew · submitted 2026-05-25 · 💻 cs.AI

Uncertainty Reasoning with Large Language Models for Explainable Disease Diagnosis

Pith reviewed 2026-06-29 21:47 UTC · model grok-4.3

classification 💻 cs.AI
keywords neuro-symbolic reasoningfuzzy logicexplainable medical AIlarge language modelslogic programmingverifiable diagnosisclinical decision support
0
0 comments X

The pith

A neuro-symbolic framework aligns LLMs with formal logic to produce explainable and verifiable disease diagnoses from patient narratives.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that uses LLMs to pull medical details from patient stories and converts them into symbolic rules based on fuzzy logic. This setup allows two stages of reasoning: first generalizing patterns from the data, then checking the conclusions with a logic engine. The result is diagnoses whose steps can be checked and changed if needed, addressing the lack of transparency in standard LLMs. A reader would care because medical decisions need to be trustworthy and open to review by doctors. The approach aims to match the accuracy of top LLMs while adding formal verifiability.

Core claim

Patient descriptions and clinical guidelines are embedded into a neural knowledge base where LLMs extract structured medical entities, temporal relations, and fuzzy symptom patterns. These are decoded into a symbolic knowledge base in fuzzy logic and declarative rules. Two-stage reasoning consists of inductive symbolic generalization to capture diagnostic patterns and inference verification via a logic programming engine. Symptoms are treated as fuzzy predicates with probabilistic weights, producing auditable, adjustable inference paths compatible with physician feedback and supporting iterative refinement through formal rules.

What carries the argument

Neuro-symbolic reasoning framework that decodes LLM outputs into fuzzy logic predicates and declarative rules for two-stage inductive and verificatory inference.

If this is right

  • Inference paths are auditable, adjustable, and compatible with physician feedback.
  • Misalignments between generated diagnoses and ground truth can be traced and corrected via formal rules.
  • The system achieves performance comparable to state-of-the-art LLMs on public benchmarks while adding interpretability.
  • It supports strong generalization and verifiable step-by-step reasoning chains.
  • The framework reconciles symbolic reasoning with LLMs for real-world clinical narratives.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Doctors could modify the declarative rules to add new clinical insights directly.
  • The method could be tested on private hospital data to check robustness beyond public benchmarks.
  • Fuzzy weights might allow modeling uncertainty in multi-disease scenarios by combining multiple paths.
  • Similar alignment could improve LLM use in other regulated fields like legal reasoning.

Load-bearing premise

LLMs can accurately extract structured medical entities, temporal relations, and fuzzy symptom patterns from natural language patient narratives without introducing critical errors or information loss.

What would settle it

Finding cases where the LLM extraction step produces incomplete or inaccurate symbolic representations that cause the logic engine to output wrong diagnoses not caught by verification.

Figures

Figures reproduced from arXiv: 2605.25566 by Jin Song Dong, Xiaoyang Fan, Yufan Cai, Zhe Hou.

Figure 1
Figure 1. Figure 1: The Neuro-symbolic Cycle. 3 Approach Our framework comprises three tightly coupled modules: (A) formal knowledge construction and reasoning toolchain, (B) a neuro-symbolic learning cycle that evolves the rule base through both data-driven updates and physician feedback, as shown in [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Medical Diagnosis Framework 6 [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
read the original abstract

Clinical decision-making requires reasoning over incomplete, imprecise, and linguistically expressed patient narratives. While large language models (LLMs) excel at extracting latent information from natural language, they lack the verifiability and interpretability essential for trustworthy medical AI. We propose a neuro-symbolic reasoning framework that aligns LLMs with formal logic to enable explainable and formally verifiable medical diagnosis. Patient descriptions and clinical guidelines are embedded into a neural knowledge base, where LLMs extract structured medical entities, temporal relations, and fuzzy symptom patterns, which are decoded into a symbolic knowledge base expressed in fuzzy logic and declarative rules. We perform two-stage reasoning: (1) inductive symbolic generalization to capture diagnostic patterns from encoded narratives, and (2) inference verification via a logic programming engine to derive and validate diagnoses consistent with clinical standards. Each symptom is treated as a fuzzy predicate with probabilistic weights, and inference paths are auditable, adjustable, and compatible with physician feedback. Unlike purely statistical methods, our system supports iterative refinement: misalignment between LLM-generated diagnoses and ground truth can be traced, explained, and corrected through formal rules. By combining logic-based transparency, LLM adaptability, and probabilistic robustness, the framework enables human-aligned healthcare inference with strong generalization and verifiable, step-by-step reasoning chains. We validate our framework on public benchmarks, demonstrating effective reconciliation of symbolic reasoning and LLMs with real-world clinical narratives. Results show performance comparable to state-of-the-art LLMs, while additionally providing interpretable reasoning paths and formally verifiable diagnostic conclusions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper proposes a neuro-symbolic framework that embeds patient narratives and clinical guidelines into a neural knowledge base, uses LLMs to extract structured entities, temporal relations, and fuzzy symptom patterns, decodes them into a symbolic KB in fuzzy logic and declarative rules, then applies two-stage reasoning (inductive symbolic generalization followed by logic programming engine verification) to produce auditable, adjustable, and physician-feedback-compatible diagnoses. It claims this yields explainable and formally verifiable medical diagnosis with performance comparable to SOTA LLMs on public benchmarks while adding interpretable reasoning paths.

Significance. If the extraction-to-symbolic step is shown to be reliable and the claimed benchmark results hold with proper controls, the work could meaningfully advance trustworthy clinical AI by combining LLM adaptability with formal verifiability and uncertainty handling via fuzzy predicates. The emphasis on traceable inference paths and iterative refinement addresses a recognized gap in purely neural medical systems. The absence of any quantitative results, error analysis, or benchmark details in the manuscript text prevents assessment of whether these benefits are realized.

major comments (3)
  1. [Abstract / §3] Abstract and implied §3: The central claim of 'formally verifiable' and 'auditable' diagnoses rests on the decoding step producing a faithful symbolic KB from LLM outputs, yet no mechanism, error bounds, or verification procedure is described for detecting LLM hallucinations, temporal relation errors, or fuzzy membership misassignments before the logic engine runs. Any mismatch propagates undetected into the inference verification stage.
  2. [Abstract] Abstract: The statement that the framework was 'validate[d] ... on public benchmarks' with 'performance comparable to state-of-the-art LLMs' is unsupported by any metrics, tables, baseline comparisons, or dataset descriptions, rendering the empirical contribution unevaluable and undermining the claim of effective reconciliation of symbolic reasoning and LLMs.
  3. [Abstract] Abstract: The two-stage reasoning (inductive generalization then logic programming verification) is presented as load-bearing for the verifiability guarantee, but no formal definition of the fuzzy predicates, the inductive generalization operator, or the logic programming engine semantics is supplied, leaving the 'formally verifiable' property without a concrete foundation.
minor comments (1)
  1. The abstract refers to 'Section 3 (implied by abstract)' for the extraction process; explicit section numbering and a high-level architecture diagram would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and will revise the manuscript to incorporate clarifications and additional details where the current presentation is incomplete.

read point-by-point responses
  1. Referee: [Abstract / §3] Abstract and implied §3: The central claim of 'formally verifiable' and 'auditable' diagnoses rests on the decoding step producing a faithful symbolic KB from LLM outputs, yet no mechanism, error bounds, or verification procedure is described for detecting LLM hallucinations, temporal relation errors, or fuzzy membership misassignments before the logic engine runs. Any mismatch propagates undetected into the inference verification stage.

    Authors: We agree that the abstract does not explicitly describe pre-inference detection mechanisms for hallucinations or misassignments. The two-stage process relies on the logic programming engine for verification, but to strengthen the claim we will add a dedicated paragraph in the revised abstract and §3 detailing consistency checks, temporal relation validation against guidelines, and probabilistic thresholds on fuzzy predicates. revision: yes

  2. Referee: [Abstract] Abstract: The statement that the framework was 'validate[d] ... on public benchmarks' with 'performance comparable to state-of-the-art LLMs' is unsupported by any metrics, tables, baseline comparisons, or dataset descriptions, rendering the empirical contribution unevaluable and undermining the claim of effective reconciliation of symbolic reasoning and LLMs.

    Authors: The abstract summarizes results that appear in the experiments section of the full manuscript. However, the referee is correct that the abstract itself provides no metrics or dataset details. We will revise the abstract to include key quantitative results, baseline comparisons, and dataset references. revision: yes

  3. Referee: [Abstract] Abstract: The two-stage reasoning (inductive generalization then logic programming verification) is presented as load-bearing for the verifiability guarantee, but no formal definition of the fuzzy predicates, the inductive generalization operator, or the logic programming engine semantics is supplied, leaving the 'formally verifiable' property without a concrete foundation.

    Authors: Section 3 supplies the formal definitions of fuzzy predicates, the inductive generalization operator, and the semantics of the logic programming engine. The abstract omits these details. We will revise to include a concise formal summary in the abstract to make the foundation explicit without lengthening the text excessively. revision: yes

Circularity Check

0 steps flagged

No circularity: framework proposal is self-contained with external benchmark validation

full rationale

The paper describes a neuro-symbolic pipeline (LLM entity/relation extraction into fuzzy symbolic KB, followed by inductive generalization and logic-program verification) without any equations, fitted parameters, or predictions that reduce to the inputs by construction. No self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided text. The central claim of verifiability is presented as resting on the logic engine operating on the decoded KB, with performance evaluated on public benchmarks; this constitutes an independent check rather than a circular reduction. The extraction step is an assumption, not a circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that LLM extraction can be losslessly mapped to fuzzy predicates and that fuzzy logic plus logic programming can faithfully represent clinical diagnostic standards.

axioms (2)
  • domain assumption Fuzzy logic predicates with probabilistic weights can represent uncertain medical symptoms and relations extracted from text.
    Stated in the abstract when symptoms are treated as fuzzy predicates.
  • domain assumption A logic programming engine can validate diagnoses against clinical standards once encoded symbolically.
    Invoked in the two-stage reasoning description.

pith-pipeline@v0.9.1-grok · 5803 in / 1233 out tokens · 25903 ms · 2026-06-29T21:47:54.288842+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages

  1. [1]

    Goldberger A. et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new re- search resource for complex physiologic signals.Circulation, 101(23):e215–e220, 2000

  2. [2]

    MIMIC-IV , a freely accessible electronic health record dataset.Scientific Data, 10(1), 2023

    Johnson A., Bulgarelli L., Shen L., et al. MIMIC-IV , a freely accessible electronic health record dataset.Scientific Data, 10(1), 2023

  3. [3]

    MIMIC-IV (version 3.1).PhysioNet, 2024

    Johnson A., Bulgarelli L., Pollard T., Gow B., Moody B., Horng S., Celi L.A., and Mark R. MIMIC-IV (version 3.1).PhysioNet, 2024

  4. [4]

    Julia Amann, Alessandro Blasimme, Effy Vayena, Dietmar Frey, and Vince I. Madai. Explain- ability for Artificial Intelligence in Healthcare: A Multidisciplinary Perspective.BMC Medical Informatics and Decision Making, 20(1):310, 2020

  5. [5]

    Rajan, Dean F

    Viraj Bhise, Suja S. Rajan, Dean F. Sittig, Robert O. Morgan, Pooja Chaudhary, and Hardeep Singh. Defining and Measuring Diagnostic Uncertainty in Medicine: A Systematic Review. Journal of General Internal Medicine, 33:103–115, 2018

  6. [6]

    Felix Busch, Lena Hoffmann, Christopher Rueger, Elon H. C. van Dijk, Rawen Kader, Esteban Ortiz-Prado, Marcus R. Makowski, Luca Saba, Martin Hadamitzky, Jakob Nikolas Kather, Daniel Truhn, Renato Cuocolo, Lisa C. Adams, and Keno K. Bressem. Current applications and challenges in large language models for patient care: a systematic review.Communications Me...

  7. [7]

    Roentgen: vision-language foundation model for chest x-ray generation.arXiv preprint arXiv:2211.12737, 2022

    Pierre Chambon, Christian Bluethgen, Jean-Benoit Delbrouck, Rogier Van der Sluijs, Małgorzata Połacin, Juan Manuel Zambrano Chaves, Tanishq Mathew Abraham, Shivanshu Purohit, Curtis P Langlotz, and Akshay Chaudhari. Roentgen: vision-language foundation model for chest x-ray generation.arXiv preprint arXiv:2211.12737, 2022

  8. [8]

    Stewart, and Jimeng Sun

    Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F. Stewart, and Jimeng Sun. GRAM: Graph-based Attention Model for Healthcare Representation Learning. InProceed- ings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, page 787–795, New York, NY , USA, 2017. Association for Computing Machinery

  9. [9]

    d’Avila Garcez and Lu ´ıs C

    Artur S. d’Avila Garcez and Lu ´ıs C. Lamb. Neurosymbolic AI: The 3rd Wave.Artificial Intelligence Review, 56(11):12387–12406, 2023

  10. [10]

    Bioethics in the era of artificial intelligence (AI).Revista Latinoamericana de Bio´etica, 22:8–10, 06 2022

    Fabio Diaz. Bioethics in the era of artificial intelligence (AI).Revista Latinoamericana de Bio´etica, 22:8–10, 06 2022

  11. [11]

    Hugging Face: The AI community building the future.https:// huggingface.co, 2023

    Hugging Face. Hugging Face: The AI community building the future.https:// huggingface.co, 2023

  12. [12]

    symptom to diagnosis on Hugging Face.https://huggingface.co/ datasets/gretelai/symptom_to_diagnosis, 2023

    Gretel.ai. symptom to diagnosis on Hugging Face.https://huggingface.co/ datasets/gretelai/symptom_to_diagnosis, 2023

  13. [13]

    Weinberger

    Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. On Calibration of Modern Neural Networks. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1321–1330. PMLR, 2017

  14. [14]

    Paul K. J. Han, William M. P. Klein, and Neeraj K. Arora. Varieties of Uncertainty in Health Care: A Conceptual Taxonomy.Medical Decision Making, 31(6):828–838, 2011

  15. [15]

    Causabil- ity and explainability of artificial intelligence in medicine.WIREs Data Mining and Knowledge Discovery, 9(4):e1312, 2019

    Andreas Holzinger, Georg Langs, Helmut Denk, Kurt Zatloukal, and Heimo M¨uller. Causabil- ity and explainability of artificial intelligence in medicine.WIREs Data Mining and Knowledge Discovery, 9(4):e1312, 2019

  16. [16]

    A Survey on Biomedical Automatic Text Summarization with Large Language Models.Information Pro- cessing & Management, 62(5):104216, 2025

    Zhenyu Huang, Xianlai Chen, Yunbo Wang, Jincai Huang, and Xing Zhao. A Survey on Biomedical Automatic Text Summarization with Large Language Models.Information Pro- cessing & Management, 62(5):104216, 2025. 11

  17. [17]

    Survey of Hallucination in Natural Language Generation

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv., 55(12), March 2023

  18. [18]

    What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams.Applied Sciences, 11(14), 2021

    Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams.Applied Sciences, 11(14), 2021

  19. [19]

    PubMedQA: A Dataset for Biomedical Research Question Answering

    Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. PubMedQA: A Dataset for Biomedical Research Question Answering. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Con- ference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, Hong Kong, China, Novem...

  20. [20]

    MedExQA: Medical Question Answering Benchmark with Multiple Explanations

    Yunsoo Kim, Jinge Wu, Yusuf Abdulle, and Honghan Wu. MedExQA: Medical Question Answering Benchmark with Multiple Explanations. InProceedings of the 23rd Workshop on Biomedical Natural Language Processing, pages 167–181, Bangkok, Thailand, August 2024. Association for Computational Linguistics

  21. [21]

    ChatDoctor-iCliniq on Hugging Face.https://huggingface.co/ datasets/lavita/ChatDoctor-iCliniq, 2024

    Lavita AI. ChatDoctor-iCliniq on Hugging Face.https://huggingface.co/ datasets/lavita/ChatDoctor-iCliniq, 2024

  22. [22]

    In- struction Tuning and CoT Prompting for Contextual Medical QA with LLMs.arXiv preprint arXiv:2506.12182, 2025

    Chenqian Le, Ziheng Gong, Chihang Wang, Haowei Ni, Panfeng Li, and Xupeng Chen. In- struction Tuning and CoT Prompting for Contextual Medical QA with LLMs.arXiv preprint arXiv:2506.12182, 2025

  23. [23]

    Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.New England Journal of Medicine, 388(13):1233–1239, 2023

    Peter Lee, Sebastien Bubeck, and Joseph Petro. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine.New England Journal of Medicine, 388(13):1233–1239, 2023

  24. [24]

    Leung, Evan W.R

    Carson K. Leung, Evan W.R. Madill, Joglas Souza, and Christine Y . Zhang. Towards Trust- worthy Artificial Intelligence in Healthcare. In2022 IEEE 10th International Conference on Healthcare Informatics (ICHI), pages 626–632, 2022

  25. [25]

    ChatGPT in health- care: A taxonomy and systematic review.Computer Methods and Programs in Biomedicine, 245:108013, 2024

    Jianning Li, Amin Dada, Behrus Puladi, Jens Kleesiek, and Jan Egger. ChatGPT in health- care: A taxonomy and systematic review.Computer Methods and Programs in Biomedicine, 245:108013, 2024

  26. [26]

    ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.Cureus, 15(6):e40895, 2023

    Yunxiang Li, Zihan Li, Kai Zhang, et al. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge.Cureus, 15(6):e40895, 2023

  27. [27]

    arXiv preprint arXiv:2303.11032 , year=

    Zhengliang Liu, Yue Huang, Xiaowei Yu, Lu Zhang, Zihao Wu, Chao Cao, Haixing Dai, Lin Zhao, Yiwei Li, Peng Shu, et al. DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4.arXiv preprint arXiv:2303.11032, 2023

  28. [28]

    Reasoning with large language models for medical question answering.Journal of the American Medical Informatics Association, 31(9):1964–1975, 2024

    Mary M Lucas, Justin Yang, Jon K Pomeroy, and Christopher C Yang. Reasoning with large language models for medical question answering.Journal of the American Medical Informatics Association, 31(9):1964–1975, 2024

  29. [29]

    Managing uncertainty and vagueness in descrip- tion logics for the Semantic Web.Web Semantics, 6(4):291–308, November 2008

    Thomas Lukasiewicz and Umberto Straccia. Managing uncertainty and vagueness in descrip- tion logics for the Semantic Web.Web Semantics, 6(4):291–308, November 2008

  30. [30]

    BioGPT: generative pre-trained transformer for biomedical text generation and mining.Brief- ings in Bioinformatics, 23(6):bbac409, 09 2022

    Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. BioGPT: generative pre-trained transformer for biomedical text generation and mining.Brief- ings in Bioinformatics, 23(6):bbac409, 09 2022

  31. [31]

    DeepProbLog: Neural Probabilistic Logic Programming

    Robin Manhaeve, Sebastijan Dumancic, Angelika Kimmig, Thomas Demeester, and Luc De Raedt. DeepProbLog: Neural Probabilistic Logic Programming. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018

  32. [32]

    The National Academies Press, Washington, DC, 2015

    National Academies of Sciences, Engineering, and Medicine.Improving Diagnosis in Health Care. The National Academies Press, Washington, DC, 2015. 12

  33. [33]

    Training Language Models to Follow Instructions with Human Feedback

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F Chris- tiano, Jan Leike, and Ryan Lowe. Training Language Models to Follow Instructions with Human Feedb...

  34. [34]

    End-to-end Differentiable Proving

    Tim Rockt ¨aschel and Sebastian Riedel. End-to-end Differentiable Proving. InAdvances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017

  35. [35]

    Logic Tensor Networks: Deep Learning and Log- ical Reasoning from Data and Knowledge

    Luciano Serafini and Artur d’Avila Garcez. Logic Tensor Networks: Deep Learning and Log- ical Reasoning from Data and Knowledge. InInternational Workshop on Neural-Symbolic Learning and Reasoning (NeSy), 2016

  36. [36]

    Karan Singhal, Shekoofeh Azizi, Tao Tu, S. Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, Perry Payne, Martin Seneviratne, Paul Gamble, Chris Kelly, Abubakr Babiker, Nathanael Sch ¨arli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Ag ¨uera y Arcas, Dale Web- ster, Greg S. Co...

  37. [37]

    Pfohl, Heather Cole-Lewis, Darlene Neal, Qazi Mamunur Rashid, Mike Schaekermann, Amy Wang, Dev Dash, Jonathan H

    Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Mohamed Amin, Le Hou, Kevin Clark, Stephen R. Pfohl, Heather Cole-Lewis, Darlene Neal, Qazi Mamunur Rashid, Mike Schaekermann, Amy Wang, Dev Dash, Jonathan H. Chen, Nigam H. Shah, Sami Lachgar, Philip Andrew Mansfield, Sushant Prakash, Bradley Green, Ewa Dominowska, Blaise Ag¨uera y Arcas...

  38. [38]

    Temporal reasoning over clinical text: the state of the art.Journal of the American Medical Informatics Association, 20(5):814–819, 2013

    Weiyi Sun, Anna Rumshisky, and Ozlem Uzuner. Temporal reasoning over clinical text: the state of the art.Journal of the American Medical Informatics Association, 20(5):814–819, 2013

  39. [39]

    Recitation-Augmented Language Models

    Zhiqing Sun, Xuezhi Wang, Yi Tay, Yiming Yang, and Denny Zhou. Recitation-Augmented Language Models. InThe Eleventh International Conference on Learning Representations, 2023

  40. [40]

    Inter- active computer-aided diagnosis on medical image using large language models.Communica- tions Engineering, 3:133, 2024

    Sheng Wang, Zihao Zhao, Xi Ouyang, Tianming Liu, Qian Wang, and Dinggang Shen. Inter- active computer-aided diagnosis on medical image using large language models.Communica- tions Engineering, 3:133, 2024

  41. [41]

    PMC- LLaMA: toward building open-source language models for medicine.Journal of the American Medical Informatics Association, 31(9):1833–1843, 04 2024

    Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya Zhang, Weidi Xie, and Yanfeng Wang. PMC- LLaMA: toward building open-source language models for medicine.Journal of the American Medical Informatics Association, 31(9):1833–1843, 04 2024

  42. [42]

    Smith, Christopher Parisien, Colin Compas, Cheryl Martin, Anthony B

    Xi Yang, Aokun Chen, Nima PourNejatian, Hoo Chang Shin, Kaleb E. Smith, Christopher Parisien, Colin Compas, Cheryl Martin, Anthony B. Costa, Mona G. Flores, Ying Zhang, Tanja Magoc, Christopher A. Harle, Gloria Lipori, Duane A. Mitchell, William R. Hogan, Elizabeth A. Shenkman, Jiang Bian, and Yonghui Wu. A large language model for electronic health recor...

  43. [43]

    L.A. Zadeh. Fuzzy Logic = Computing with Words.IEEE Transactions on Fuzzy Systems, 4(2):103–111, 1996. 13