RETINA-SAFE benchmark and ECRT two-stage triage improve hallucination risk detection in medical LLMs for retinal decisions by 0.15-0.19 balanced accuracy over baselines using internal representations and logit shifts.
Med-HALT: Medical Domain Hallucination Test for Large Language Models
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
ELERAG integrates Wikidata entity linking with hybrid RRF re-ranking into RAG and outperforms baselines on a custom Italian academic dataset while cross-encoder methods win on the general SQuAD-it dataset.
FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.
The system integrates a Neo4j knowledge graph, four-stage symptom matching with LLM verification, genetic-algorithm-optimized proactive questioning, and multimodal evidence-based visualizations to improve diagnostic transparency and treatment interpretability in TCM, reporting 32% fewer non-standard
citing papers explorer
-
From Retinal Evidence to Safe Decisions: RETINA-SAFE and ECRT for Hallucination Risk Triage in Medical LLMs
RETINA-SAFE benchmark and ECRT two-stage triage improve hallucination risk detection in medical LLMs for retinal decisions by 0.15-0.19 balanced accuracy over baselines using internal representations and logit shifts.
-
Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms
ELERAG integrates Wikidata entity linking with hybrid RRF re-ranking into RAG and outperforms baselines on a custom Italian academic dataset while cross-encoder methods win on the general SQuAD-it dataset.
-
Explicit Evidence Grounding via Structured Inline Citation Generation
FullCite introduces three strategies for structured inline citation generation in QA and finds LLMs identify relevant documents well but struggle with precise evidence spans on ASQA, BioASQ, and ExpertQA.
-
Evidence-Based Intelligent Diagnostic and Therapeutic Visualization System with Large Language Models: Multi-Turn Interaction and Multimodal Treatment Plan Generation
The system integrates a Neo4j knowledge graph, four-stage symptom matching with LLM verification, genetic-algorithm-optimized proactive questioning, and multimodal evidence-based visualizations to improve diagnostic transparency and treatment interpretability in TCM, reporting 32% fewer non-standard