LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties
Pith reviewed 2026-05-24 06:35 UTC · model grok-4.3
The pith
Large language models assist with diagnosis and treatment in cancer care, dermatology, dental care, neurodegenerative disorders, and mental health.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLMs have become pivotal in supporting healthcare, including physicians, healthcare providers, and patients, with applications in diagnostic and treatment-related functionalities across cancer care, dermatology, dental care, neurodegenerative disorders, and mental health. The review explores the challenges and opportunities associated with integrating LLMs in healthcare and provides an overview of handling diverse data types within the medical field.
What carries the argument
Organized review that groups LLM uses by medical specialty and separates diagnostic functions from treatment functions.
If this is right
- LLMs can already contribute to diagnostic and treatment decisions inside the listed specialties.
- Successful integration requires addressing documented challenges around data handling and clinical reliability.
- The models show potential to aid both providers and patients once limitations are managed.
- Diverse medical data types must be accommodated for the applications to scale.
Where Pith is reading between the lines
- If the reviewed applications prove reliable in practice, LLMs could shift physician time away from routine documentation toward complex cases.
- The same review structure could be extended to additional specialties not covered here to test consistency of benefits.
- Real-world trials that measure patient outcomes against the review's claims would provide the next direct test.
Load-bearing premise
The studies chosen and summarized in the review give an accurate picture of real LLM use in those specialties without major gaps or one-sided selection.
What would settle it
A broader literature search that finds either no published applications in one of the five specialties or that the cited papers systematically overstate what the models can currently do.
read the original abstract
We aim to present a comprehensive overview of the latest advancements in utilizing Large Language Models (LLMs) within the healthcare sector, emphasizing their transformative impact across various medical domains. LLMs have become pivotal in supporting healthcare, including physicians, healthcare providers, and patients. Our review provides insight into the applications of Large Language Models (LLMs) in healthcare, specifically focusing on diagnostic and treatment-related functionalities. We shed light on how LLMs are applied in cancer care, dermatology, dental care, neurodegenerative disorders, and mental health, highlighting their innovative contributions to medical diagnostics and patient care. Throughout our analysis, we explore the challenges and opportunities associated with integrating LLMs in healthcare, recognizing their potential across various medical specialties despite existing limitations. Additionally, we offer an overview of handling diverse data types within the medical field.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a narrative review of Large Language Models (LLMs) in healthcare, claiming to offer a comprehensive overview of their applications in diagnostic and treatment functionalities across cancer care, dermatology, dental care, neurodegenerative disorders, and mental health. It discusses challenges, opportunities, integration issues, and handling of diverse medical data types while asserting that LLMs have become pivotal for physicians, providers, and patients.
Significance. If the synthesis accurately represents the literature without major omissions or bias, the review could provide a useful high-level map of LLM use cases across multiple medical specialties for researchers entering the area. The paper explicitly covers five distinct domains and flags practical challenges, which are strengths for an overview piece, though the lack of systematic methods reduces its value as a definitive reference.
major comments (2)
- [Abstract] Abstract and opening sections: the claim to deliver a 'comprehensive overview' and to highlight 'transformative impact' is not supported by any description of literature search strategy, databases queried, date ranges, inclusion/exclusion criteria, or synthesis method. This directly weakens the central assertion that the cited works establish LLMs as 'pivotal' across the listed specialties.
- [Specialty application sections] Throughout the specialty sections (cancer care, dermatology, etc.): applications are described via selected citations, but without a transparent selection process it is impossible to determine whether negative results, failed deployments, or underrepresented sub-areas were systematically considered, undermining the reliability of the 'pivotal' and 'innovative contributions' statements.
minor comments (1)
- [Title] Title contains an unusual spacing around the colon ('LLMs-Healthcare :'); consider standardizing formatting.
Simulated Author's Rebuttal
We thank the referee for these comments on transparency. The manuscript is a narrative review, not a systematic one, and we agree the current wording overstates the scope. We will revise the abstract, introduction, and add a brief review approach statement to clarify the narrative nature, qualify claims of comprehensiveness and pivotal status, and note that citations were selected to illustrate key applications and challenges rather than exhaustively cover all outcomes.
read point-by-point responses
-
Referee: [Abstract] Abstract and opening sections: the claim to deliver a 'comprehensive overview' and to highlight 'transformative impact' is not supported by any description of literature search strategy, databases queried, date ranges, inclusion/exclusion criteria, or synthesis method. This directly weakens the central assertion that the cited works establish LLMs as 'pivotal' across the listed specialties.
Authors: We agree the abstract and opening sections do not describe a systematic search process. This manuscript was prepared as a narrative review synthesizing selected recent literature on LLM applications across the five specialties. We will revise the abstract to replace 'comprehensive overview' with 'narrative overview of selected applications' and remove or qualify 'transformative impact' and 'pivotal' language. A new short subsection on review approach will be added stating that references were chosen based on prominence and relevance to diagnostic/treatment uses, without systematic database queries or formal inclusion criteria. revision: yes
-
Referee: [Specialty application sections] Throughout the specialty sections (cancer care, dermatology, etc.): applications are described via selected citations, but without a transparent selection process it is impossible to determine whether negative results, failed deployments, or underrepresented sub-areas were systematically considered, undermining the reliability of the 'pivotal' and 'innovative contributions' statements.
Authors: The referee is correct that the selection criteria are not stated and that the sections emphasize reported applications. As a narrative review, the intent was to map prominent use cases rather than perform a balanced systematic assessment of successes and failures. We will add language in the introduction and challenges section acknowledging that the cited works represent positive or innovative examples from the literature, that publication bias may exist, and that failed deployments are not systematically reviewed here. Statements about 'innovative contributions' will be qualified accordingly. revision: yes
Circularity Check
No circularity: purely descriptive narrative review with no derivations or fitted claims
full rationale
This is a narrative literature review summarizing LLM applications across medical specialties. It contains no equations, no parameter fitting, no predictions derived from inputs, and no load-bearing self-citations that reduce the central claims to prior author work by construction. The patterns (self-definitional, fitted-input-called-prediction, uniqueness-imported, etc.) have no applicability. The paper is self-contained as a descriptive overview; any concerns about search methodology or selection bias fall under correctness or completeness risk, not circularity per the hard rules.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 3 Pith papers
-
Uncertainty-Aware Foundation Models for Clinical Data
The work introduces uncertainty-aware foundation models for clinical data by learning set-valued patient representations that enforce consistency across partial observations and integrate multimodal self-supervised ob...
-
Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs
LLMs can infer individual domain knowledge from Slack logs with best MAE of 21.13% using Gemini 2.5 Flash, though performance varies by model and shows weak dependence on message volume.
-
First, Do No Harm (With LLMs): Mitigating Racial Bias via Agentic Workflows
All five tested LLMs deviated from US race-stratified disease distributions in synthetic case generation, while retrieval-based agentic workflows improved mean p-value by 0.0348, median p-value by 0.1166, and mean dif...
Reference graph
Works this paper leans on
-
[1]
ACM Computing Surveys, 56: 1-40
Min B, Ross H, Sulem E, et al., 2023, Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56: 1-40
work page 2023
-
[2]
Emergent Abilities of Large Language Models
Wei J, Tay Y, Bommasani R, et al., 2022, Emergent abilities of large language models. arXiv preprint arXiv:2206.07682
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[3]
, 2020, Language models are few -shot learners
Brown T, Mann B, Ryder N, et al. , 2020, Language models are few -shot learners. Advances in Neural Information Processing Systems, 33: 1877-1901
work page 2020
-
[4]
, 2023, Large language models in medicine
Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. , 2023, Large language models in medicine. Nature medicine, 29: 1930-1940
work page 2023
-
[5]
Journal of Medical Systems, 47: 33
Cascella M, Montomoli J, Bellini V, Bignami E., 2023, Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios . Journal of Medical Systems, 47: 33
work page 2023
-
[6]
Sorin V, Klang E, Sklair-Levy, et al., 2023, Large language model (ChatGPT) as a support tool for breast tumor board. NPJ Breast Cancer, 9:44. https://doi.org/10.1038/s41523-023- 00557-8
-
[7]
Lukac S, Dayan D, Fink V, et al. , 2023, Evaluating ChatGPT as an adjunct for the multidisciplinary tumor board decision -making in primary breast cancer cases. Arch Gynecol Obstet, 308:1831-1844. doi: 10.1007/s00404-023-07130-5
-
[8]
Gebrael G, Sahu KK, Chigarira B, et al., 2023, Enhancing Triage Efficiency and Accuracy in Emergency Rooms for Patients with Metastatic Prostate Cancer: A Retrospective Analysis of Artificial Intelligence -Assisted Triage Using ChatGPT 4.0. Cancers (Basel), 5:3717. doi: 10.3390/cancers15143717. 22
-
[9]
Arya Rao, John Kim, Meghana Kamineni et al., 2023, Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT -4 Versus GPT-3.5 in a Breast Imaging Pilot . Journal of the American College of Radiology, 20. https://doi.org/10.1016/j.jacr.2023.05.003
-
[10]
Haver HL, Ambinder EB, Bahl M, et al. , 2023, Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT. Radiology, 307. doi: 10.1148/radiol.230424
-
[11]
Sarraju A, Bruemmer D, Van Iterson E, et al., 2023, Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat -Based Artificial Intelligence Model. JAMA,329:842-844. doi: 10.1001/jama.2023.1044
-
[12]
Schulte B , 2023, Capacity of ChatGPT to Identify Guideline -Based Treatments for Advanced Solid Tumors. Cureus, 15:e37938. doi: 10.7759/cureus.37938
-
[13]
Haemmerli J, Sveikata L, Nouri A, et al. , 2023, ChatGPT in glioma adjuvant therapy decision making: ready to assume the role of a doctor in the tumour board? BMJ Health Care Inform., 30. doi: 10.1136/bmjhci-2023-100775
-
[14]
, 2023, Use of Artificial Intelligence Chatbots for Cancer Treatment Information
Chen S, Kann BH, Foote MB, et al. , 2023, Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol. , 9:1459–1462. doi:10.1001/jamaoncol.2023.2954
-
[15]
Front Public Health, 11:1145513
Yakupu A, Aimaier R, Yuan, B, et al., 2023, The burden of skin and subcutaneous diseases: findings from the global burden of disease study 2019. Front Public Health, 11:1145513. doi: 10.3389/fpubh.2023.1145513
-
[16]
Urban K, Chu S, Giesey RL, et al ., 2020, Burden of skin disease and associated socioeconomic status in Asia: a cross-sectional analysis from the Global Burden of Disease Study 1990-2017. JAAD Int., 2:40–50. 10.1016/j.jdin.2020.10.006
- [17]
-
[18]
Electrical Engineering and Systems Science, 1-12
Zhou J, He X, Sun L, et al., 2023, SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model. Electrical Engineering and Systems Science, 1-12. https://arxiv.org/abs/2304.10691
-
[19]
Cold Spring Harb Perspect Biol., 9:a028035
Dugger, BN., Dickson, DW, 2017, Pathology of Neurodegenerative Disease. Cold Spring Harb Perspect Biol., 9:a028035. doi: 10.1101/cshperspect.a028035. 23
-
[20]
Brain Pathology , https://doi.org/10.1111/bpa.13207
Koga S, Martin NB, Dickson DW., 2023, Evaluating the performance of large language models: ChatGPT and Google bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathology , https://doi.org/10.1111/bpa.13207
-
[21]
PLOS Digital Health , 1(12), e0000168
Agbavor F, Liang H., 2022, Predicting dementia from spontaneous speech using large language models. PLOS Digital Health , 1(12), e0000168. https://doi.org/10.1371/journal.pdig.0000168
-
[22]
Luz S, Haider F, de la Fuente S, et al., 2021, Detecting cognitive decline using speech only: The ADReSSo Challenge. ArXiv Prepr ArXiv210409356
work page 2021
-
[23]
Journal of Biomedical Informatics, 144, 104442
Mao C, Xu J, Rasmussen L, et al., 2023, AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer’s disease. Journal of Biomedical Informatics, 144, 104442
work page 2023
-
[24]
arXiv preprint arXiv:2307.02514
Cai H, Huang X, Liu Z, et al., 2023, Exploring Multimodal Approaches for Alzheimer’s Disease Detection Using Patient Speech Transcript and Audio Data. arXiv preprint arXiv:2307.02514
-
[25]
arXiv preprint arXiv:2305.19280
Feng Y, Wang J, Gu X, et al., 2023, Large language models improve Alzheimer’s disease diagnosis using multi-modality data. arXiv preprint arXiv:2305.19280
-
[26]
Applied Intelligence, 53: 16029-16040
Ying Y, Yang T, Zhou H, 2023, Multimodal fusion for alzheimer’s disease recognition. Applied Intelligence, 53: 16029-16040
work page 2023
-
[27]
, 2022, Deep learning for caries detection: A systematic review
Mohammad-Rahimi H, Motamedian SR, Rohban MH , et al. , 2022, Deep learning for caries detection: A systematic review. J Dent , 122:104115. doi: 10.1016/j.jdent.2022.104115. Epub 2022 Mar 30. PMID: 35367318
-
[28]
Urban R, et al. , 2023, AI-assisted CBCT data management in modern dental practice: benefits, limitations and innovations. Electronics 12, 1710
work page 2023
-
[29]
International Journal of Oral Science , 15(1)
Huang H, Zheng O, Wang D, et al., 2023, ChatGPT for shaping the future of dentistry: The potential of multi -modal large language model. International Journal of Oral Science , 15(1). https://doi.org/10.1038/s41368-023-00239-y
-
[30]
Computation and Language, 1-15
Galatzer-Levy IR, McDuff DN, Karthikesalingam A, Malgaroli M, 2023, The Capability of Large Language Models to Measure Psychiatric Functioning. Computation and Language, 1-15. https://arxiv.org/abs/2308.01834 24
-
[31]
arXiv preprint arXiv:2307.14385
Xu X, Yao B, Dong Y, et al., 2023, Leveraging large language models for mental health prediction via online text data. arXiv preprint arXiv:2307.14385
-
[32]
Ma Z, Mei Y, Su Z. , 2024, Understanding the Benefits and Challenges of Using Large Language Model -based Conversational Agents for Mental Well -being Support . AMIA Annu Symp Proc. 11;2023:1105-1114
work page 2024
-
[33]
Kjell, O., Kjell, K., & Schwartz, H. A. , 2023, AI-based large language models are ready to transform psychological health assessment
work page 2023
-
[34]
Wu S, Koo M, Blum , et al. , 2023, A comparative study of open -source large language models, GPT -4 and Claude 2: Multiple -choice test taking in nephrology. arXiv.org. https://arxiv.org/abs/2308.04709
-
[35]
https://doi.org/10.3390/diagnostics13111950
Lahat A, Shachar E, Avidan B, et al., 2023, Evaluating the utility of a large language model in answering common patients’ gastrointestinal health -related questions: Are we there yet? Diagnostics, 13:1950. https://doi.org/10.3390/diagnostics13111950
-
[36]
https://doi.org/10.1016/j.jaip.2023.05.042
Goktas P, Karakaya G, Kalyoncu, et al., 2023, Artificial intelligence chatbots in allergy and immunology practice: Where have we been and where are we going? The Journal of Allergy and Clinical Immunology: In Practice , 11 :2697-2700. https://doi.org/10.1016/j.jaip.2023.05.042
-
[37]
https://www.nature.com/articles/s41586-023-06291-2
Singhal K, Azizi S, Tu T, et al., 2023, Large language models encode clinical knowledge. https://www.nature.com/articles/s41586-023-06291-2
work page 2023
-
[38]
Chatcad: Interac- tive computer-aided diagnosis on medical image using large language models
Wang S, Zhao Z., Ouyang, X., et al. , 2023, ChatCAD: Interactive Computer -Aided Diagnosis on Medical Image using Large Language Models. Computer Science , 1 -11. https://arxiv.org/abs/2302.07257
-
[39]
Bazi Y, Rahhal MM, Bashmal L, Zuair M, 2023, Vision–language model for visual question answering in medical imagery. Bioengineering, 10(3), 380. https://doi.org/10.3390/bioengineering10030380
-
[40]
Journal of the American Medical Informatics Association, 1-8
Tan RS, Lin Q, Low GH, et al., 2023, Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting. Journal of the American Medical Informatics Association, 1-8. https://doi.org/10.1093/jamia/ocad133
-
[41]
Chen, Z., Balan, M. M., & Brown, K. , 2023, Language models are few-shot learners for prognostic prediction. arXiv.org. https://arxiv.org/abs/2302.12692
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.