pith. sign in

arxiv: 2311.12882 · v3 · submitted 2023-10-28 · 💻 cs.CL · cs.AI

LLMs-Healthcare : Current Applications and Challenges of Large Language Models in various Medical Specialties

Pith reviewed 2026-05-24 06:35 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords large language modelshealthcare applicationscancer caredermatologydental careneurodegenerative disordersmental healthchallenges
0
0 comments X

The pith

Large language models assist with diagnosis and treatment in cancer care, dermatology, dental care, neurodegenerative disorders, and mental health.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review paper establishes that large language models now support physicians, providers, and patients through diagnostic and treatment tasks in five specific medical areas. It maps existing applications, notes how the models process varied medical data, and weighs the practical challenges against the opportunities for wider use. A sympathetic reader would care because the overview shows concrete entry points for AI in everyday clinical work and flags the barriers that still limit adoption. The paper treats these uses as already active rather than purely theoretical.

Core claim

LLMs have become pivotal in supporting healthcare, including physicians, healthcare providers, and patients, with applications in diagnostic and treatment-related functionalities across cancer care, dermatology, dental care, neurodegenerative disorders, and mental health. The review explores the challenges and opportunities associated with integrating LLMs in healthcare and provides an overview of handling diverse data types within the medical field.

What carries the argument

Organized review that groups LLM uses by medical specialty and separates diagnostic functions from treatment functions.

If this is right

  • LLMs can already contribute to diagnostic and treatment decisions inside the listed specialties.
  • Successful integration requires addressing documented challenges around data handling and clinical reliability.
  • The models show potential to aid both providers and patients once limitations are managed.
  • Diverse medical data types must be accommodated for the applications to scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the reviewed applications prove reliable in practice, LLMs could shift physician time away from routine documentation toward complex cases.
  • The same review structure could be extended to additional specialties not covered here to test consistency of benefits.
  • Real-world trials that measure patient outcomes against the review's claims would provide the next direct test.

Load-bearing premise

The studies chosen and summarized in the review give an accurate picture of real LLM use in those specialties without major gaps or one-sided selection.

What would settle it

A broader literature search that finds either no published applications in one of the five specialties or that the cited papers systematically overstate what the models can currently do.

read the original abstract

We aim to present a comprehensive overview of the latest advancements in utilizing Large Language Models (LLMs) within the healthcare sector, emphasizing their transformative impact across various medical domains. LLMs have become pivotal in supporting healthcare, including physicians, healthcare providers, and patients. Our review provides insight into the applications of Large Language Models (LLMs) in healthcare, specifically focusing on diagnostic and treatment-related functionalities. We shed light on how LLMs are applied in cancer care, dermatology, dental care, neurodegenerative disorders, and mental health, highlighting their innovative contributions to medical diagnostics and patient care. Throughout our analysis, we explore the challenges and opportunities associated with integrating LLMs in healthcare, recognizing their potential across various medical specialties despite existing limitations. Additionally, we offer an overview of handling diverse data types within the medical field.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a narrative review of Large Language Models (LLMs) in healthcare, claiming to offer a comprehensive overview of their applications in diagnostic and treatment functionalities across cancer care, dermatology, dental care, neurodegenerative disorders, and mental health. It discusses challenges, opportunities, integration issues, and handling of diverse medical data types while asserting that LLMs have become pivotal for physicians, providers, and patients.

Significance. If the synthesis accurately represents the literature without major omissions or bias, the review could provide a useful high-level map of LLM use cases across multiple medical specialties for researchers entering the area. The paper explicitly covers five distinct domains and flags practical challenges, which are strengths for an overview piece, though the lack of systematic methods reduces its value as a definitive reference.

major comments (2)
  1. [Abstract] Abstract and opening sections: the claim to deliver a 'comprehensive overview' and to highlight 'transformative impact' is not supported by any description of literature search strategy, databases queried, date ranges, inclusion/exclusion criteria, or synthesis method. This directly weakens the central assertion that the cited works establish LLMs as 'pivotal' across the listed specialties.
  2. [Specialty application sections] Throughout the specialty sections (cancer care, dermatology, etc.): applications are described via selected citations, but without a transparent selection process it is impossible to determine whether negative results, failed deployments, or underrepresented sub-areas were systematically considered, undermining the reliability of the 'pivotal' and 'innovative contributions' statements.
minor comments (1)
  1. [Title] Title contains an unusual spacing around the colon ('LLMs-Healthcare :'); consider standardizing formatting.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these comments on transparency. The manuscript is a narrative review, not a systematic one, and we agree the current wording overstates the scope. We will revise the abstract, introduction, and add a brief review approach statement to clarify the narrative nature, qualify claims of comprehensiveness and pivotal status, and note that citations were selected to illustrate key applications and challenges rather than exhaustively cover all outcomes.

read point-by-point responses
  1. Referee: [Abstract] Abstract and opening sections: the claim to deliver a 'comprehensive overview' and to highlight 'transformative impact' is not supported by any description of literature search strategy, databases queried, date ranges, inclusion/exclusion criteria, or synthesis method. This directly weakens the central assertion that the cited works establish LLMs as 'pivotal' across the listed specialties.

    Authors: We agree the abstract and opening sections do not describe a systematic search process. This manuscript was prepared as a narrative review synthesizing selected recent literature on LLM applications across the five specialties. We will revise the abstract to replace 'comprehensive overview' with 'narrative overview of selected applications' and remove or qualify 'transformative impact' and 'pivotal' language. A new short subsection on review approach will be added stating that references were chosen based on prominence and relevance to diagnostic/treatment uses, without systematic database queries or formal inclusion criteria. revision: yes

  2. Referee: [Specialty application sections] Throughout the specialty sections (cancer care, dermatology, etc.): applications are described via selected citations, but without a transparent selection process it is impossible to determine whether negative results, failed deployments, or underrepresented sub-areas were systematically considered, undermining the reliability of the 'pivotal' and 'innovative contributions' statements.

    Authors: The referee is correct that the selection criteria are not stated and that the sections emphasize reported applications. As a narrative review, the intent was to map prominent use cases rather than perform a balanced systematic assessment of successes and failures. We will add language in the introduction and challenges section acknowledging that the cited works represent positive or innovative examples from the literature, that publication bias may exist, and that failed deployments are not systematically reviewed here. Statements about 'innovative contributions' will be qualified accordingly. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive narrative review with no derivations or fitted claims

full rationale

This is a narrative literature review summarizing LLM applications across medical specialties. It contains no equations, no parameter fitting, no predictions derived from inputs, and no load-bearing self-citations that reduce the central claims to prior author work by construction. The patterns (self-definitional, fitted-input-called-prediction, uniqueness-imported, etc.) have no applicability. The paper is self-contained as a descriptive overview; any concerns about search methodology or selection bias fall under correctness or completeness risk, not circularity per the hard rules.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review paper; contains no free parameters, axioms, or invented entities as it performs no derivations or modeling.

pith-pipeline@v0.9.0 · 5674 in / 923 out tokens · 13563 ms · 2026-05-24T06:35:18.296719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Uncertainty-Aware Foundation Models for Clinical Data

    cs.LG 2026-04 unverdicted novelty 6.0

    The work introduces uncertainty-aware foundation models for clinical data by learning set-valued patient representations that enforce consistency across partial observations and integrate multimodal self-supervised ob...

  2. Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs

    cs.CL 2026-05 unverdicted novelty 5.0

    LLMs can infer individual domain knowledge from Slack logs with best MAE of 21.13% using Gemini 2.5 Flash, though performance varies by model and shows weak dependence on message volume.

  3. First, Do No Harm (With LLMs): Mitigating Racial Bias via Agentic Workflows

    cs.CY 2026-04 unverdicted novelty 4.0

    All five tested LLMs deviated from US race-stratified disease distributions in synthetic case generation, while retrieval-based agentic workflows improved mean p-value by 0.0348, median p-value by 0.1166, and mean dif...

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · cited by 3 Pith papers · 1 internal anchor

  1. [1]

    ACM Computing Surveys, 56: 1-40

    Min B, Ross H, Sulem E, et al., 2023, Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys, 56: 1-40

  2. [2]

    Emergent Abilities of Large Language Models

    Wei J, Tay Y, Bommasani R, et al., 2022, Emergent abilities of large language models. arXiv preprint arXiv:2206.07682

  3. [3]

    , 2020, Language models are few -shot learners

    Brown T, Mann B, Ryder N, et al. , 2020, Language models are few -shot learners. Advances in Neural Information Processing Systems, 33: 1877-1901

  4. [4]

    , 2023, Large language models in medicine

    Thirunavukarasu AJ, Ting DSJ, Elangovan K, et al. , 2023, Large language models in medicine. Nature medicine, 29: 1930-1940

  5. [5]

    Journal of Medical Systems, 47: 33

    Cascella M, Montomoli J, Bellini V, Bignami E., 2023, Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios . Journal of Medical Systems, 47: 33

  6. [6]

    NPJ Breast Cancer, 9:44

    Sorin V, Klang E, Sklair-Levy, et al., 2023, Large language model (ChatGPT) as a support tool for breast tumor board. NPJ Breast Cancer, 9:44. https://doi.org/10.1038/s41523-023- 00557-8

  7. [7]

    , 2023, Evaluating ChatGPT as an adjunct for the multidisciplinary tumor board decision -making in primary breast cancer cases

    Lukac S, Dayan D, Fink V, et al. , 2023, Evaluating ChatGPT as an adjunct for the multidisciplinary tumor board decision -making in primary breast cancer cases. Arch Gynecol Obstet, 308:1831-1844. doi: 10.1007/s00404-023-07130-5

  8. [8]

    Cancers (Basel), 5:3717

    Gebrael G, Sahu KK, Chigarira B, et al., 2023, Enhancing Triage Efficiency and Accuracy in Emergency Rooms for Patients with Metastatic Prostate Cancer: A Retrospective Analysis of Artificial Intelligence -Assisted Triage Using ChatGPT 4.0. Cancers (Basel), 5:3717. doi: 10.3390/cancers15143717. 22

  9. [9]

    Dreyer, and Marc D

    Arya Rao, John Kim, Meghana Kamineni et al., 2023, Evaluating GPT as an Adjunct for Radiologic Decision Making: GPT -4 Versus GPT-3.5 in a Breast Imaging Pilot . Journal of the American College of Radiology, 20. https://doi.org/10.1016/j.jacr.2023.05.003

  10. [10]

    , 2023, Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT

    Haver HL, Ambinder EB, Bahl M, et al. , 2023, Appropriateness of Breast Cancer Prevention and Screening Recommendations Provided by ChatGPT. Radiology, 307. doi: 10.1148/radiol.230424

  11. [11]

    JAMA,329:842-844

    Sarraju A, Bruemmer D, Van Iterson E, et al., 2023, Appropriateness of Cardiovascular Disease Prevention Recommendations Obtained From a Popular Online Chat -Based Artificial Intelligence Model. JAMA,329:842-844. doi: 10.1001/jama.2023.1044

  12. [12]

    Cureus, 15:e37938

    Schulte B , 2023, Capacity of ChatGPT to Identify Guideline -Based Treatments for Advanced Solid Tumors. Cureus, 15:e37938. doi: 10.7759/cureus.37938

  13. [13]

    , 2023, ChatGPT in glioma adjuvant therapy decision making: ready to assume the role of a doctor in the tumour board? BMJ Health Care Inform., 30

    Haemmerli J, Sveikata L, Nouri A, et al. , 2023, ChatGPT in glioma adjuvant therapy decision making: ready to assume the role of a doctor in the tumour board? BMJ Health Care Inform., 30. doi: 10.1136/bmjhci-2023-100775

  14. [14]

    , 2023, Use of Artificial Intelligence Chatbots for Cancer Treatment Information

    Chen S, Kann BH, Foote MB, et al. , 2023, Use of Artificial Intelligence Chatbots for Cancer Treatment Information. JAMA Oncol. , 9:1459–1462. doi:10.1001/jamaoncol.2023.2954

  15. [15]

    Front Public Health, 11:1145513

    Yakupu A, Aimaier R, Yuan, B, et al., 2023, The burden of skin and subcutaneous diseases: findings from the global burden of disease study 2019. Front Public Health, 11:1145513. doi: 10.3389/fpubh.2023.1145513

  16. [16]

    JAAD Int., 2:40–50

    Urban K, Chu S, Giesey RL, et al ., 2020, Burden of skin disease and associated socioeconomic status in Asia: a cross-sectional analysis from the Global Burden of Disease Study 1990-2017. JAAD Int., 2:40–50. 10.1016/j.jdin.2020.10.006

  17. [17]

    Case Rep

    Burlando M, Muracchioli A, Cozzani E, et al., 2021, Biologic Therapy: Case Report and Narrative Review. Case Rep. Dermatol., 13, 372–378

  18. [18]

    Electrical Engineering and Systems Science, 1-12

    Zhou J, He X, Sun L, et al., 2023, SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model. Electrical Engineering and Systems Science, 1-12. https://arxiv.org/abs/2304.10691

  19. [19]

    Cold Spring Harb Perspect Biol., 9:a028035

    Dugger, BN., Dickson, DW, 2017, Pathology of Neurodegenerative Disease. Cold Spring Harb Perspect Biol., 9:a028035. doi: 10.1101/cshperspect.a028035. 23

  20. [20]

    Brain Pathology , https://doi.org/10.1111/bpa.13207

    Koga S, Martin NB, Dickson DW., 2023, Evaluating the performance of large language models: ChatGPT and Google bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathology , https://doi.org/10.1111/bpa.13207

  21. [21]

    PLOS Digital Health , 1(12), e0000168

    Agbavor F, Liang H., 2022, Predicting dementia from spontaneous speech using large language models. PLOS Digital Health , 1(12), e0000168. https://doi.org/10.1371/journal.pdig.0000168

  22. [22]

    ArXiv Prepr ArXiv210409356

    Luz S, Haider F, de la Fuente S, et al., 2021, Detecting cognitive decline using speech only: The ADReSSo Challenge. ArXiv Prepr ArXiv210409356

  23. [23]

    Journal of Biomedical Informatics, 144, 104442

    Mao C, Xu J, Rasmussen L, et al., 2023, AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer’s disease. Journal of Biomedical Informatics, 144, 104442

  24. [24]

    arXiv preprint arXiv:2307.02514

    Cai H, Huang X, Liu Z, et al., 2023, Exploring Multimodal Approaches for Alzheimer’s Disease Detection Using Patient Speech Transcript and Audio Data. arXiv preprint arXiv:2307.02514

  25. [25]

    arXiv preprint arXiv:2305.19280

    Feng Y, Wang J, Gu X, et al., 2023, Large language models improve Alzheimer’s disease diagnosis using multi-modality data. arXiv preprint arXiv:2305.19280

  26. [26]

    Applied Intelligence, 53: 16029-16040

    Ying Y, Yang T, Zhou H, 2023, Multimodal fusion for alzheimer’s disease recognition. Applied Intelligence, 53: 16029-16040

  27. [27]

    , 2022, Deep learning for caries detection: A systematic review

    Mohammad-Rahimi H, Motamedian SR, Rohban MH , et al. , 2022, Deep learning for caries detection: A systematic review. J Dent , 122:104115. doi: 10.1016/j.jdent.2022.104115. Epub 2022 Mar 30. PMID: 35367318

  28. [28]

    , 2023, AI-assisted CBCT data management in modern dental practice: benefits, limitations and innovations

    Urban R, et al. , 2023, AI-assisted CBCT data management in modern dental practice: benefits, limitations and innovations. Electronics 12, 1710

  29. [29]

    International Journal of Oral Science , 15(1)

    Huang H, Zheng O, Wang D, et al., 2023, ChatGPT for shaping the future of dentistry: The potential of multi -modal large language model. International Journal of Oral Science , 15(1). https://doi.org/10.1038/s41368-023-00239-y

  30. [30]

    Computation and Language, 1-15

    Galatzer-Levy IR, McDuff DN, Karthikesalingam A, Malgaroli M, 2023, The Capability of Large Language Models to Measure Psychiatric Functioning. Computation and Language, 1-15. https://arxiv.org/abs/2308.01834 24

  31. [31]

    arXiv preprint arXiv:2307.14385

    Xu X, Yao B, Dong Y, et al., 2023, Leveraging large language models for mental health prediction via online text data. arXiv preprint arXiv:2307.14385

  32. [32]

    , 2024, Understanding the Benefits and Challenges of Using Large Language Model -based Conversational Agents for Mental Well -being Support

    Ma Z, Mei Y, Su Z. , 2024, Understanding the Benefits and Challenges of Using Large Language Model -based Conversational Agents for Mental Well -being Support . AMIA Annu Symp Proc. 11;2023:1105-1114

  33. [33]

    Kjell, O., Kjell, K., & Schwartz, H. A. , 2023, AI-based large language models are ready to transform psychological health assessment

  34. [34]

    , 2023, A comparative study of open -source large language models, GPT -4 and Claude 2: Multiple -choice test taking in nephrology

    Wu S, Koo M, Blum , et al. , 2023, A comparative study of open -source large language models, GPT -4 and Claude 2: Multiple -choice test taking in nephrology. arXiv.org. https://arxiv.org/abs/2308.04709

  35. [35]

    https://doi.org/10.3390/diagnostics13111950

    Lahat A, Shachar E, Avidan B, et al., 2023, Evaluating the utility of a large language model in answering common patients’ gastrointestinal health -related questions: Are we there yet? Diagnostics, 13:1950. https://doi.org/10.3390/diagnostics13111950

  36. [36]

    https://doi.org/10.1016/j.jaip.2023.05.042

    Goktas P, Karakaya G, Kalyoncu, et al., 2023, Artificial intelligence chatbots in allergy and immunology practice: Where have we been and where are we going? The Journal of Allergy and Clinical Immunology: In Practice , 11 :2697-2700. https://doi.org/10.1016/j.jaip.2023.05.042

  37. [37]

    https://www.nature.com/articles/s41586-023-06291-2

    Singhal K, Azizi S, Tu T, et al., 2023, Large language models encode clinical knowledge. https://www.nature.com/articles/s41586-023-06291-2

  38. [38]

    Chatcad: Interac- tive computer-aided diagnosis on medical image using large language models

    Wang S, Zhao Z., Ouyang, X., et al. , 2023, ChatCAD: Interactive Computer -Aided Diagnosis on Medical Image using Large Language Models. Computer Science , 1 -11. https://arxiv.org/abs/2302.07257

  39. [39]

    Bioengineering, 10(3), 380

    Bazi Y, Rahhal MM, Bashmal L, Zuair M, 2023, Vision–language model for visual question answering in medical imagery. Bioengineering, 10(3), 380. https://doi.org/10.3390/bioengineering10030380

  40. [40]

    Journal of the American Medical Informatics Association, 1-8

    Tan RS, Lin Q, Low GH, et al., 2023, Inferring cancer disease response from radiology reports using large language models with data augmentation and prompting. Journal of the American Medical Informatics Association, 1-8. https://doi.org/10.1093/jamia/ocad133

  41. [41]

    M., & Brown, K

    Chen, Z., Balan, M. M., & Brown, K. , 2023, Language models are few-shot learners for prognostic prediction. arXiv.org. https://arxiv.org/abs/2302.12692