Recognition: unknown
MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts
read the original abstract
This paper presents the formal release of MedMentions, a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines. In addition to the full corpus, a sub-corpus of MedMentions is also presented, comprising annotations for a subset of UMLS 2017 targeted towards document retrieval. To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking
LongBEL improves biomedical entity linking consistency by combining full-document context with memory of previous predictions trained via cross-validation rather than gold labels.
-
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
Feature reconstruction in GSSL is robust to noise in text-driven biomedical graphs while relation reconstruction is sensitive, with bidirectional GNN architectures performing better on noisy data and yielding up to 7%...
-
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
A dual-purpose benchmark supplies two text-derived knowledge graphs and one expert reference graph on the same biomedical corpus to jointly measure construction method quality and GNN robustness via semi-supervised no...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.