FontFusion adds hierarchical token conditioning, position-aware embeddings, and multi-level dropping to DiT diffusion models, yielding 76% relative gains on decorative fonts and 68-76% consistency improvements via a dual DeepFont+DINOv2 encoder.
In: Proceedings of the 41st International Conference on Machine Learning
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FontFusion: Enhancing Generative Text in Diffusion Models with Typographic Conditioning
FontFusion adds hierarchical token conditioning, position-aware embeddings, and multi-level dropping to DiT diffusion models, yielding 76% relative gains on decorative fonts and 68-76% consistency improvements via a dual DeepFont+DINOv2 encoder.