Explainable Prediction of Medical Codes from Clinical Text

Jacob Eisenstein; James Mullenbach; Jimeng Sun; Jon Duke; Sarah Wiegreffe

arxiv: 1802.05695 · v2 · pith:J7IIVJN5new · submitted 2018-02-15 · 💻 cs.CL · cs.LG· stat.ML

Explainable Prediction of Medical Codes from Clinical Text

James Mullenbach , Sarah Wiegreffe , Jon Duke , Jimeng Sun , Jacob Eisenstein This is my paper

classification 💻 cs.CL cs.LGstat.ML

keywords codestextclinicalmedicalattentionconvolutionalfurthermoremechanism

0 comments

read the original abstract

Clinical notes are text documents that are created by clinicians for each patient encounter. They are typically accompanied by medical codes, which describe the diagnosis and treatment. Annotating these codes is labor intensive and error prone; furthermore, the connection between the codes and the text is not annotated, obscuring the reasons and details behind specific diagnoses and treatments. We present an attentional convolutional network that predicts medical codes from clinical text. Our method aggregates information across the document using a convolutional neural network, and uses an attention mechanism to select the most relevant segments for each of the thousands of possible codes. The method is accurate, achieving precision@8 of 0.71 and a Micro-F1 of 0.54, which are both better than the prior state of the art. Furthermore, through an interpretability evaluation by a physician, we show that the attention mechanism identifies meaningful explanations for each code assignment

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free
cs.CL 2026-05 unverdicted novelty 5.0

Retrieval with frozen embeddings and k-NN delivers competitive accuracy, high data efficiency, and zero hallucinations on legal multi-label annotation across ECtHR and Eurlex datasets.
Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models
cs.CL 2026-05 unverdicted novelty 4.0

Fine-tuned e5_large LLM reaches 0.866 F1_micro on ICD classification of 145k Spanish psychiatric texts, outperforming BoW, TF-IDF, and other transformers.
Federated Learning for ICD Classification with Lightweight Models and Pretrained Embeddings
cs.IR 2025-07 unverdicted novelty 4.0

Lightweight federated learning with frozen embeddings and MLP heads reaches competitive micro and macro F1 scores for ICD-9 and ICD-10 coding on MIMIC-IV, nearly matching centralized training.