AttentionBender applies 2D transforms to cross-attention maps in video diffusion transformers, producing distributed distortions and glitch aesthetics that reveal entangled attention mechanisms while serving as both an XAI probe and creative tool.
VDT: General-Purpose Video Diffusion Transformers via Mask Modeling
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 2polarities
background 2representative citing papers
FrameDiT proposes Matrix Attention for DiTs to achieve SOTA video generation with improved temporal coherence and efficiency comparable to local factorized attention.
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.
Lumos-Nexus is a training-efficient video generation framework using two-stage alignment of a lightweight model followed by progressive frequency bridging to a high-fidelity generator in homogeneous latent space, plus the new VR-Bench for reasoning evaluation.
RoPeSLR combines 3D RoPE-guided sparse attention with head-wise low-rank parameterization to achieve sub-quadratic complexity in DiTs while preserving distance awareness for efficient ultra-long video synthesis.
LetsTalk combines a multimodal diffusion transformer, noise-regularized memory bank, deep compression autoencoder, and symbiotic/direct fusion schemes to achieve state-of-the-art quality and efficiency in long-duration talking video generation.
Introduces a lightweight temporal module to extend pretrained DiT video models with time editing capabilities while preserving the original generative prior.
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.
citing papers explorer
-
GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.