Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

Guangyue Xu; Joyce Chai; Parisa Kordjamshidi

arxiv: 2211.05077 · v1 · pith:DDMSHCGNnew · submitted 2022-11-09 · 💻 cs.CV

Prompting Large Pre-trained Vision-Language Models For Compositional Concept Learning

Guangyue Xu , Parisa Kordjamshidi , Joyce Chai This is my paper

classification 💻 cs.CV

keywords learningcompositionalpromptcompvltextitachievesczsllargemodel

0 comments

read the original abstract

This work explores the zero-shot compositional learning ability of large pre-trained vision-language models(VLMs) within the prompt-based learning framework and propose a model (\textit{PromptCompVL}) to solve the compositonal zero-shot learning (CZSL) problem. \textit{PromptCompVL} makes two design choices: first, it uses a soft-prompting instead of hard-prompting to inject learnable parameters to reprogram VLMs for compositional learning. Second, to address the compositional challenge, it uses the soft-embedding layer to learn primitive concepts in different combinations. By combining both soft-embedding and soft-prompting, \textit{PromptCompVL} achieves state-of-the-art performance on the MIT-States dataset. Furthermore, our proposed model achieves consistent improvement compared to other CLIP-based methods which shows the effectiveness of the proposed prompting strategies for CZSL.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds
cs.LG 2026-06 unverdicted novelty 6.0

Compositionality emerges in neural networks only in a narrow depth-connectivity regime, with gradient descent converging to fractured solutions outside it.
A Systematic Study of Behavioral Cloning for Scientific Data Annotation
cs.HC 2026-05 unverdicted novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phase...