MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.
Open-pmc-18m: A high-fidelity large scale medical dataset for multimodal representation learning.arXiv preprint arXiv:2506.02738, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
OpenMedReason supplies a large open corpus of multimodal medical reasoning examples extracted from scientific articles, paired with a benchmark that measures perception, knowledge, and rationale quality, yielding 20% VQA gains after supervised fine-tuning.
PMC-InterCPT builds a context-grounded biomedical interleaved corpus from PMC literature and shows it improves multimodal performance on Qwen3.5-4B-Base after CPT and SFT while using fewer tokens.
citing papers explorer
-
PMC-InterCPT: Rethinking Biomedical Interleaved Data for Multimodal Continued Pretraining
PMC-InterCPT builds a context-grounded biomedical interleaved corpus from PMC literature and shows it improves multimodal performance on Qwen3.5-4B-Base after CPT and SFT while using fewer tokens.