Clinical Camel: An open expert-level medical language model with dialogue-based knowledge encoding

Direct preference optimization: your language model is secretly a reward model · 2022 · arXiv 2305.12031

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.

MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs

cs.CL · 2025-12-23 · unverdicted · novelty 6.0

MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.

Capabilities of Gemini Models in Medicine

cs.AI · 2024-04-29 · unverdicted · novelty 6.0

Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.

citing papers explorer

Showing 3 of 3 citing papers.

CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification cs.CL · 2026-05-05 · unverdicted · none · ref 26
CuraView detects sentence-level faithfulness hallucinations in medical discharge summaries via GraphRAG knowledge graphs and multi-agent evidence grading, achieving 0.831 F1 on critical contradictions with a fine-tuned Qwen3-14B model and 50% relative improvement over baselines.
MediEval: A Unified Medical Benchmark for Patient-Contextual and Knowledge-Grounded Reasoning in LLMs cs.CL · 2025-12-23 · unverdicted · none · ref 2
MediEval benchmark reveals LLM failures like hallucinated support and truth inversion in medical reasoning, while CoRFu fine-tuning raises macro-F1 by 16.4 points and removes truth inversion errors.
Capabilities of Gemini Models in Medicine cs.AI · 2024-04-29 · unverdicted · none · ref 270
Med-Gemini sets new records on 10 of 14 medical benchmarks including 91.1% on MedQA-USMLE, beats GPT-4V by 44.5% on multimodal tasks, and surpasses humans on medical text summarization.

Clinical Camel: An open expert-level medical language model with dialogue-based knowledge encoding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer