← back to paper
arxiv: 2605.09313 · 2 revisions
Attention Sinks in Diffusion Transformers: A Causal Analysis