Data curation alone raises VLM accuracy by more than 11 points on average across many benchmarks while cutting required training compute by up to 87 times.
Unicorn: Text-only data synthesis for vision language model training
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Modality representations share dominant semantic geometry but have an anisotropic residual gap; AnisoAlign corrects source representations boundedly using target geometry for unpaired alignment.
citing papers explorer
-
20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone
Data curation alone raises VLM accuracy by more than 11 points on average across many benchmarks while cutting required training compute by up to 87 times.
-
Anisotropic Modality Align
Modality representations share dominant semantic geometry but have an anisotropic residual gap; AnisoAlign corrects source representations boundedly using target geometry for unpaired alignment.
- Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models