A single transformer combines language modeling loss and diffusion loss on mixed-modality data, scaling to 7B parameters and 2T tokens while matching specialized language and diffusion models.
Hellaswag: Can a machine really finish your sentence? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL-2019)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
A single transformer combines language modeling loss and diffusion loss on mixed-modality data, scaling to 7B parameters and 2T tokens while matching specialized language and diffusion models.