DataComp-VLM benchmark shows instruction-heavy data mixing outperforms filtering for VLM training, with DCVLM-Baseline achieving 63.6% on 33 tasks for 8B models (+5.4pp over FineVision).
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
CPI ranks image-text pairs using phrase-level sensitivity scores from nonce substitutions to improve compositional performance in VL pretraining, achieving gains on relation benchmarks with a 50% data subset.
citing papers explorer
-
DataComp-VLM: Improved Open Datasets for Vision-Language Models
DataComp-VLM benchmark shows instruction-heavy data mixing outperforms filtering for VLM training, with DCVLM-Baseline achieving 63.6% on 33 tasks for 8B models (+5.4pp over FineVision).
-
What Does the Caption Really Say? Counterfactual Phrase Intervention for Compositional Data Selection in Vision-Language Pretraining
CPI ranks image-text pairs using phrase-level sensitivity scores from nonce substitutions to improve compositional performance in VL pretraining, achieving gains on relation benchmarks with a 50% data subset.