A bag-of-position-tagged-words embedding guides text-to-image diffusion models as effectively as full contextual text embeddings from standard encoders.
arXiv:2305.00000 (2023)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CatalyticMLLM unifies property prediction and inverse design for catalytic materials inside one graph-text multimodal LLM and reports better performance than decoupled baselines.
citing papers explorer
-
Text-to-Image Models Need Less from Text Encoders Than You Think
A bag-of-position-tagged-words embedding guides text-to-image diffusion models as effectively as full contextual text embeddings from standard encoders.
-
CatalyticMLLM: A Graph-Text Multimodal Large Language Model for Catalytic Materials
CatalyticMLLM unifies property prediction and inverse design for catalytic materials inside one graph-text multimodal LLM and reports better performance than decoupled baselines.