A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.
Grounded language-image pre-training
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2verdicts
CONDITIONAL 2roles
background 1polarities
background 1representative citing papers
SynSpill synthetic data enables PEFT of VLMs and boosts YOLO and DETR detectors for industrial spill detection, making their performance comparable after training.
citing papers explorer
-
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.
-
SynSpill: Improved Industrial Spill Detection With Synthetic Data
SynSpill synthetic data enables PEFT of VLMs and boosts YOLO and DETR detectors for industrial spill detection, making their performance comparable after training.