A survey on medical large language models: Technology, application, trustworthiness, and future directions

Liu, L · 2024 · arXiv 2406.03712

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Evaluation of 6233 MedGPTs finds 25-30% with low factual accuracy, 33.6-54.3% violating operational thresholds, and 57% of action-enabled models lacking privacy disclosures.

Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

cs.CL · 2026-04-02 · unverdicted · novelty 6.0

A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.

LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning

cs.LG · 2026-01-28 · unverdicted · novelty 6.0

LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.

CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning

cs.AI · 2026-01-19 · unverdicted · novelty 6.0

CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.

SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

cs.AI · 2025-06-17 · unverdicted · novelty 6.0

SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect knowledge acquisition.

In-depth Analysis of Graph-based RAG in a Unified Framework

cs.IR · 2025-03-06 · unverdicted · novelty 6.0

A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.

ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation

cs.IR · 2025-02-14 · unverdicted · novelty 6.0

ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.

CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation

cs.AI · 2025-10-26 · unverdicted · novelty 4.0

CLIN-LLM combines uncertainty-calibrated BioBERT classification with retrieval-augmented FLAN-T5 generation and safety post-processing to reach 98% accuracy on clinical cases while cutting unsafe antibiotic suggestions by 67%.

QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model

cs.CL · 2025-04-13 · unverdicted · novelty 4.0

QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.

citing papers explorer

Showing 9 of 9 citing papers.

Do No Harm? Hallucination and Actor-Level Abuse in Web-Deployed Medical Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 50
Evaluation of 6233 MedGPTs finds 25-30% with low factual accuracy, 33.6-54.3% violating operational thresholds, and 57% of action-enabled models lacking privacy disclosures.
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework cs.CL · 2026-04-02 · unverdicted · none · ref 57
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
LLM-AutoDP: Automatic Data Processing via LLM Agents for Model Fine-tuning cs.LG · 2026-01-28 · unverdicted · none · ref 29
LLM agents iteratively generate and optimize data processing strategies for fine-tuning, delivering over 80% win rates versus unprocessed data and 65% versus LLM-based AutoML baselines while cutting search time by up to 10x.
CURE-Med: Curriculum-Informed Reinforcement Learning for Multilingual Medical Reasoning cs.AI · 2026-01-19 · unverdicted · none · ref 15
CURE-MED pairs a new 13-language medical reasoning benchmark with curriculum RL to raise logical correctness to 70% and language consistency to 95% at 32B scale while outperforming baselines.
SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention cs.AI · 2025-06-17 · unverdicted · none · ref 5
SEAT preserves epistemic abstention in LLMs during knowledge adaptation via sparse tuning and entity-perturbed KL regularization, yielding 18-101% better abstention on unknown queries while retaining near-perfect knowledge acquisition.
In-depth Analysis of Graph-based RAG in a Unified Framework cs.IR · 2025-03-06 · unverdicted · none · ref 54
A unified framework and large-scale comparison of graph-based RAG methods on QA tasks yields new high-performing variants obtained by recombining existing components.
ArchRAG: Attributed Community-based Hierarchical Retrieval-Augmented Generation cs.IR · 2025-02-14 · unverdicted · none · ref 33
ArchRAG proposes attributed-community hierarchical indexing and LLM clustering to improve accuracy and lower token usage in graph-based retrieval-augmented generation.
CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation cs.AI · 2025-10-26 · unverdicted · none · ref 17
CLIN-LLM combines uncertainty-calibrated BioBERT classification with retrieval-augmented FLAN-T5 generation and safety post-processing to reach 98% accuracy on clinical cases while cutting unsafe antibiotic suggestions by 67%.
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model cs.CL · 2025-04-13 · unverdicted · none · ref 6
QM-ToT applies Tree of Thoughts decomposition and evaluator layers to quantized LLMs, reporting accuracy gains from 34% to 50% on MedQAUSMLE for LLaMA2-70b and from 58.77% to 69.49% for LLaMA-3.1-8b, plus an 86.27% improvement in data distillation using only 3.9% of the data.

A survey on medical large language models: Technology, application, trustworthiness, and future directions

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer