Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

Tianhe Wu, Ruibin Li, Lei Zhang, Kede Ma · 2026 · cs.CV · arXiv 2602.03139

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open full Pith review browse 7 citing papers arXiv PDF

abstract

Distribution matching distillation (DMD) facilitates few-step image generation by aligning a distilled student with a reference multi-step teacher. In practice, however, optimizing DMD can reduce sample diversity in few-step synthesis, and existing remedies typically rely on perceptual or adversarial regularization, leading to stability and scalability challenges during training. Here, we describe diversity-preserved DMD (DP-DMD), a role-separated distillation method inspired by the complementary roles of early and late denoising steps. Specifically, the first distillation step is trained with a teacher-derived target-prediction objective (e.g., v-prediction) to preserve sample diversity, while the remaining steps are optimized with the standard DMD loss to refine perceptual quality. DP-DMD, with no perceptual or adversarial regularization, no additional modules, and no teacher-generated reference samples, preserves sample diversity while maintaining competitive visual quality under few-step sampling, providing a simple and stable alternative to other DMD variants.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

CoDMD: Copula-aware Distribution Matching Distillation for Fast Video Generation

cs.CV · 2026-06-20 · unverdicted · novelty 7.0

CoDMD adds a copula-matching regularizer to DMD for distilling 50-step video diffusion models to 4 steps, reporting VBench scores of 84.46/84.87 on 1.3B/14B Wan-2.1-T2V models.

STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.

HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.

Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation

cs.CV · 2026-04-11 · conditional · novelty 6.0

Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.

Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

cs.CV · 2026-06-16 · unverdicted · novelty 5.0

Data-Forcing Distillation adds a teacher score discrepancy term to DMD-style distillation, restoring diversity and fidelity in few-step video models with 100-300 finetuning steps.

Qwen-Image-Flash: Beyond Objective Design

cs.CV · 2026-06-02 · unverdicted · novelty 4.0

Empirical analysis of data, guidance, and task mixture in few-step distillation of Qwen-Image-2.0 produces the Qwen-Image-Flash model with improved performance in unified generation and editing tasks.

Qwen-Image-2.0 Technical Report

cs.CV · 2026-05-11 · unverdicted · novelty 4.0

Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.

citing papers explorer

Showing 7 of 7 citing papers.

CoDMD: Copula-aware Distribution Matching Distillation for Fast Video Generation cs.CV · 2026-06-20 · unverdicted · none · ref 34 · internal anchor
CoDMD adds a copula-matching regularizer to DMD for distilling 50-step video diffusion models to 4 steps, reporting VBench scores of 84.46/84.87 on 1.3B/14B Wan-2.1-T2V models.
STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models cs.CV · 2026-05-12 · unverdicted · none · ref 38 · internal anchor
STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation cs.CV · 2026-05-12 · unverdicted · none · ref 24 · 2 links · internal anchor
HorizonDrive is a new anti-drifting autoregressive training and distillation method that enables minute-scale stable driving video rollouts by making the teacher model rollout-capable via scheduled rollout recovery and teacher rollout DMD.
Long-Horizon Streaming Video Generation via Hybrid Attention with Decoupled Distillation cs.CV · 2026-04-11 · conditional · none · ref 46 · internal anchor
Hybrid Forcing combines linear temporal attention for long-range retention, block-sparse attention for efficiency, and decoupled distillation to achieve real-time unbounded 832x480 streaming video generation at 29.5 FPS.
Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation cs.CV · 2026-06-16 · unverdicted · none · ref 32 · internal anchor
Data-Forcing Distillation adds a teacher score discrepancy term to DMD-style distillation, restoring diversity and fidelity in few-step video models with 100-300 finetuning steps.
Qwen-Image-Flash: Beyond Objective Design cs.CV · 2026-06-02 · unverdicted · none · ref 14 · internal anchor
Empirical analysis of data, guidance, and task mixture in few-step distillation of Qwen-Image-2.0 produces the Qwen-Image-Flash model with improved performance in unified generation and editing tasks.
Qwen-Image-2.0 Technical Report cs.CV · 2026-05-11 · unverdicted · none · ref 27 · internal anchor
Qwen-Image-2.0 unifies high-fidelity image generation and precise editing by coupling Qwen3-VL with a Multimodal Diffusion Transformer, improving text rendering, photorealism, and complex prompt following over prior versions.

Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer