HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
Bidirectional sparse attention for faster video diffusion training
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 3polarities
background 3representative citing papers
GTA generates 3D worlds from single images via a two-stage video diffusion process that prioritizes geometry before appearance to improve structural consistency.
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
Attention sparsity in video DiTs is an input-stable layer-wise property, enabling offline profiling and online bidirectional QK co-clustering for up to 1.93x speedup with PSNR up to 29 dB.
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
Spectral Progressive Diffusion progressively grows resolution during denoising of pretrained diffusion models via spectral noise expansion and a power-spectrum-derived schedule, enabling training-free speedups and a fine-tuning recipe.
citing papers explorer
-
HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention
HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.
-
GTA: Advancing Image-to-3D World Generation via Geometry Then Appearance Video Diffusion
GTA generates 3D worlds from single images via a two-stage video diffusion process that prioritizes geometry before appearance to improve structural consistency.
-
Efficient Video Diffusion Models: Advancements and Challenges
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
-
Attention Sparsity is Input-Stable: Training-Free Sparse Attention for Video Generation via Offline Sparsity Profiling and Online QK Co-Clustering
Attention sparsity in video DiTs is an input-stable layer-wise property, enabling offline profiling and online bidirectional QK co-clustering for up to 1.93x speedup with PSNR up to 29 dB.
-
Video Generation Models as World Models: Efficient Paradigms, Architectures and Algorithms
Video generation models can function as world simulators if efficiency gaps in spatiotemporal modeling are bridged via organized paradigms, architectures, and algorithms.
-
Spectral Progressive Diffusion for Efficient Image and Video Generation
Spectral Progressive Diffusion progressively grows resolution during denoising of pretrained diffusion models via spectral noise expansion and a power-spectrum-derived schedule, enabling training-free speedups and a fine-tuning recipe.