No other representation component is needed: Diffusion transformers can provide representation guidance by themselves

Dengyang Jiang, Mengmeng Wang, Liuzhuozheng Li, Lei Zhang, Haoyu Wang, Wei Wei, Guang Dai, Yanning Zhang, Jingdong Wang · 2025 · arXiv 2505.02831

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

representative citing papers

STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.

Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment

cs.LG · 2026-05-07 · unverdicted · novelty 7.0

Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.

D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

D-OPSD enables continuous supervised fine-tuning of few-step diffusion models via on-policy self-distillation where the model acts as both teacher (multimodal context) and student (text-only context) on its own roll-outs.

Stage-adaptive audio diffusion modeling

cs.SD · 2026-05-06 · unverdicted · novelty 6.0

A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.

Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds

cs.LG · 2026-04-28 · unverdicted · novelty 5.0

Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser by using separate time spaces per class.

Elucidating Representation Degradation Problem in Diffusion Model Training

cs.LG · 2026-05-11 · unverdicted · novelty 4.0

Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.

citing papers explorer

Showing 6 of 6 citing papers.

STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models cs.CV · 2026-05-12 · unverdicted · none · ref 21
STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment cs.LG · 2026-05-07 · unverdicted · none · ref 6
Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.
D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models cs.CV · 2026-05-06 · unverdicted · none · ref 39
D-OPSD enables continuous supervised fine-tuning of few-step diffusion models via on-policy self-distillation where the model acts as both teacher (multimodal context) and student (text-only context) on its own roll-outs.
Stage-adaptive audio diffusion modeling cs.SD · 2026-05-06 · unverdicted · none · ref 5
A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds cs.LG · 2026-04-28 · unverdicted · none · ref 12
Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser by using separate time spaces per class.
Elucidating Representation Degradation Problem in Diffusion Model Training cs.LG · 2026-05-11 · unverdicted · none · ref 20
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.

No other representation component is needed: Diffusion transformers can provide representation guidance by themselves

fields

years

verdicts

representative citing papers

citing papers explorer