pith:CUQ74TIX
Attention Sinks in Diffusion Transformers: A Causal Analysis
Attention sinks in diffusion transformers can be removed without degrading text-image alignment.
arxiv:2605.09313 v3 · 2026-05-10 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CUQ74TIX5EPCS7HC7HKBUAEQSY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at k=1; only under stronger interventions (k≥10) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless sink-specific -- ∼6× larger than equal-budget random masking
That the paired interventions on score and value paths causally isolate the contribution of attention sinks without introducing uncontrolled side effects on the diffusion trajectory, and that the chosen proxy metrics (CLIP-T, HPS-v2) faithfully measure semantic alignment independent of low-level perceptual style.
Suppressing attention sinks in diffusion transformers does not degrade CLIP-T alignment at moderate levels but induces sink-specific perceptual shifts six times larger than equal-budget random masking.
Receipt and verification
| First computed | 2026-06-19T16:09:59.054972Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
1521fe4d17e91e297ce2f9d41a0090961ec2e4bd113296322c2deaa95981042b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CUQ74TIX5EPCS7HC7HKBUAEQSY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1521fe4d17e91e297ce2f9d41a0090961ec2e4bd113296322c2deaa95981042b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "2e678686ddf8e8f0a0a2aa85907c38763664fc2272aeaf5f5f2cbf60fb532d1f",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-10T04:14:07Z",
"title_canon_sha256": "a76e32ef22598836394f31fe1af7964fc4df66be2d201c406ed9084348d99e7d"
},
"schema_version": "1.0",
"source": {
"id": "2605.09313",
"kind": "arxiv",
"version": 3
}
}