PercepT discovers perceptual topic clusters from vision-language data via unsupervised training and maps images to them with attention pooling, reporting silhouette 0.97 and AUC 0.94 on ArtELingo.
No Culture Left Behind:
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.
citing papers explorer
-
Beyond Semantics: Modeling Factual and Affective Perceptual Experiences from Vision-Language Data
PercepT discovers perceptual topic clusters from vision-language data via unsupervised training and maps images to them with attention pooling, reporting silhouette 0.97 and AUC 0.94 on ArtELingo.
-
Multilingual Vision-Language Models, A Survey
The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.