Goku provides a 2M-pair dataset for multi-task structural video editing, Goku-Edit model with MLLM and dual-branch design, and Goku-Bench yielding up to 8% gains in instruction following.
Allegro: Open the black box of commercial-level video generation model
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 10roles
background 2polarities
background 2representative citing papers
ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
REINS uses supervised PCA on safety-labeled activations to find a linear direction that, when added to hidden states at roughly 50% depth in video diffusion transformers, redirects generations from unsafe to safe content across multiple models.
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.
ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.
Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
A frequency-based latent compression method for video VAEs yields higher reconstruction quality than channel-reduction baselines at fixed compression ratios.
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.
citing papers explorer
-
Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing
Goku provides a 2M-pair dataset for multi-task structural video editing, Goku-Edit model with MLLM and dual-branch design, and Goku-Bench yielding up to 8% gains in instruction following.
-
ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation
ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
-
Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering
REINS uses supervised PCA on safety-labeled activations to find a linear direction that, when added to hidden states at roughly 50% depth in video diffusion transformers, redirects generations from unsafe to safe content across multiple models.
-
GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.
-
ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation
ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.
-
Latent Spatial Memory for Video World Models
Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.
-
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
-
Latent-Compressed Variational Autoencoder for Video Diffusion Models
A frequency-based latent compression method for video VAEs yields higher reconstruction quality than channel-reduction baselines at fixed compression ratios.
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.
-
Open-Sora Plan: Open-Source Large Video Generation Model
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.