Infant-scale VLMs discriminate size and texture visually but perform poorly on color and struggle to ground attributes in text, while web-scale models excel at color grounding.
Deep residual learning for image recognition
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2verdicts
UNVERDICTED 2representative citing papers
LGAP generates captions from images to guide diffusion-based purification, outperforming other adversarial defenses without specialized training.
citing papers explorer
-
Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models
Infant-scale VLMs discriminate size and texture visually but perform poorly on color and struggle to ground attributes in text, while web-scale models excel at color grounding.
-
Language Guided Adversarial Purification
LGAP generates captions from images to guide diffusion-based purification, outperforming other adversarial defenses without specialized training.