Gligen: Open-set grounded text-to-image genera- tion

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, Yong Jae Lee, “Gligen: Open-set grounded text-to-image generation,”arXiv preprint arXiv:2301 · 2023 · arXiv 2301.07093

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Visual Instruction Tuning

cs.CV · 2023-04-17 · unverdicted · novelty 7.0

LLaVA is trained on GPT-4 generated visual instruction data to achieve 85.1% relative performance to GPT-4 on synthetic multimodal tasks and 92.53% accuracy on Science QA.

MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation

cs.CV · 2025-09-18 · unverdicted · novelty 6.0

MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.

citing papers explorer

Showing 2 of 2 citing papers.

Visual Instruction Tuning cs.CV · 2023-04-17 · unverdicted · none · ref 30
LLaVA is trained on GPT-4 generated visual instruction data to achieve 85.1% relative performance to GPT-4 on synthetic multimodal tasks and 92.53% accuracy on Science QA.
MaskAttn-SDXL: Controllable Region-Level Text-To-Image Generation cs.CV · 2025-09-18 · unverdicted · none · ref 10
MaskAttn-SDXL adds token-conditioned spatial gating to SDXL cross-attention to sparsify irrelevant token-to-location bindings and improve region-level controllability without retraining or inference edits.

Gligen: Open-set grounded text-to-image genera- tion

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer