MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts

Sunil Mohan , Donghui Li

Authors on Pith no claims yet

classification 💻 cs.CL cs.LG

keywords biomedicalmedmentionsannotatedconceptsumlscorpusentitylinking

read the original abstract

This paper presents the formal release of MedMentions, a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines. In addition to the full corpus, a sub-corpus of MedMentions is also presented, comprising annotations for a subset of UMLS 2017 targeted towards document retrieval. To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
cs.CL 2026-05 unverdicted novelty 7.0

LongBEL improves biomedical entity linking consistency by combining full-document context with memory of previous predictions trained via cross-validation rather than gold labels.
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
cs.LG 2026-05 unverdicted novelty 7.0

Feature reconstruction in GSSL is robust to noise in text-driven biomedical graphs while relation reconstruction is sensitive, with bidirectional GNN architectures performing better on noisy data and yielding up to 7%...
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
cs.LG 2026-05 unverdicted novelty 6.0

A dual-purpose benchmark supplies two text-derived knowledge graphs and one expert reference graph on the same biomedical corpus to jointly measure construction method quality and GNN robustness via semi-supervised no...