Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al · 2021

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

method 2 background 1

citation-polarity summary

use method 2 background 1

representative citing papers

Why CNN Features Are not Gaussian: A Statistical Anatomy of Deep Representations

cs.CV · 2024-11-07 · unverdicted · novelty 6.0

CNN feature activations follow long-tailed Weibull-like distributions with increasing tail dependence by depth rather than Gaussian, indicating a Matthew process that concentrates signal in tails.

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

cs.CV · 2023-08-13 · unverdicted · novelty 6.0

IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.

ModelScope Text-to-Video Technical Report

cs.CV · 2023-08-12 · unverdicted · novelty 4.0

ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

cs.CV · 2023-08-02 · unverdicted · novelty 4.0

OpenFlamingo provides open-source autoregressive vision-language models that achieve 80-89% of Flamingo performance on seven vision-language datasets.

citing papers explorer

Showing 4 of 4 citing papers.

Why CNN Features Are not Gaussian: A Statistical Anatomy of Deep Representations cs.CV · 2024-11-07 · unverdicted · none · ref 57
CNN feature activations follow long-tailed Weibull-like distributions with increasing tail dependence by depth rather than Gaussian, indicating a Matthew process that concentrates signal in tails.
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models cs.CV · 2023-08-13 · unverdicted · none · ref 8
IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.
ModelScope Text-to-Video Technical Report cs.CV · 2023-08-12 · unverdicted · none · ref 42
ModelScopeT2V is a 1.7-billion-parameter text-to-video model built on Stable Diffusion that adds temporal modeling and outperforms prior methods on three evaluation metrics.
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models cs.CV · 2023-08-02 · unverdicted · none · ref 30
OpenFlamingo provides open-source autoregressive vision-language models that achieve 80-89% of Flamingo performance on seven vision-language datasets.

Learning transferable visual models from natural language supervision

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer