Cross-attention control in text-conditioned models enables localized and global image edits by editing only the input text prompt.
U-net: Convolutional networks for biomedical image segmentation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.
citing papers explorer
-
Prompt-to-Prompt Image Editing with Cross Attention Control
Cross-attention control in text-conditioned models enables localized and global image edits by editing only the input text prompt.
-
Learning Visual Feature-Based World Models via Residual Latent Action
RLA-WM predicts residual latent actions via flow matching to create visual feature world models that outperform prior feature-based and diffusion approaches while enabling offline video-based robot RL.