Goku provides a 2M-pair dataset for multi-task structural video editing, Goku-Edit model with MLLM and dual-branch design, and Goku-Bench yielding up to 8% gains in instruction following.
Allegro: Open the black box of commercial-level video generation model
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10roles
background 2polarities
background 2representative citing papers
ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
REINS uses supervised PCA on safety-labeled activations to find a linear direction that, when added to hidden states at roughly 50% depth in video diffusion transformers, redirects generations from unsafe to safe content across multiple models.
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.
ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.
Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
A frequency-based latent compression method for video VAEs yields higher reconstruction quality than channel-reduction baselines at fixed compression ratios.
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
citing papers explorer
-
GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.