Med-HALT: Medical Domain Hallucination Test for Large Language Models

Ankit Pal, Logesh Kumar Umapathi, Malaikannan Sankarasubbu · 2023 · DOI 10.18653/v1/2023.conll-1.21

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

representative citing papers

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

RETINA-SAFE benchmark and ECRT two-stage triage improve hallucination risk detection in medical LLMs for retinal decisions by 0.15-0.19 balanced accuracy over baselines using internal representations and logit shifts.

Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms

cs.IR · 2025-12-05 · unverdicted · novelty 5.0

ELERAG integrates Wikidata entity linking with hybrid RRF re-ranking into RAG and outperforms baselines on a custom Italian academic dataset while cross-encoder methods win on the general SQuAD-it dataset.

Explicit Evidence Grounding via Structured Inline Citation Generation

cs.CL · 2026-06-05 · unverdicted · novelty 4.0

FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.

Evidence-Based Intelligent Diagnostic and Therapeutic Visualization System with Large Language Models: Multi-Turn Interaction and Multimodal Treatment Plan Generation

cs.AI · 2026-06-05 · unverdicted · novelty 4.0

The system integrates a Neo4j knowledge graph, four-stage symptom matching with LLM verification, genetic-algorithm-optimized proactive questioning, and multimodal evidence-based visualizations to improve diagnostic transparency and treatment interpretability in TCM, reporting 32% fewer non-standard

citing papers explorer

Showing 4 of 4 citing papers.

From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs cs.AI · 2026-04-07 · unverdicted · none · ref 11
RETINA-SAFE benchmark and ECRT two-stage triage improve hallucination risk detection in medical LLMs for retinal decisions by 0.15-0.19 balanced accuracy over baselines using internal representations and logit shifts.
Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms cs.IR · 2025-12-05 · unverdicted · none · ref 4
ELERAG integrates Wikidata entity linking with hybrid RRF re-ranking into RAG and outperforms baselines on a custom Italian academic dataset while cross-encoder methods win on the general SQuAD-it dataset.
Explicit Evidence Grounding via Structured Inline Citation Generation cs.CL · 2026-06-05 · unverdicted · none · ref 28
FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.
Evidence-Based Intelligent Diagnostic and Therapeutic Visualization System with Large Language Models: Multi-Turn Interaction and Multimodal Treatment Plan Generation cs.AI · 2026-06-05 · unverdicted · none · ref 74
The system integrates a Neo4j knowledge graph, four-stage symptom matching with LLM verification, genetic-algorithm-optimized proactive questioning, and multimodal evidence-based visualizations to improve diagnostic transparency and treatment interpretability in TCM, reporting 32% fewer non-standard

Med-HALT: Medical Domain Hallucination Test for Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer