Allegro: Open the black box of commercial-level video generation model

Yuan Zhou, Qiuyue Wang, Yuxuan Cai, Huan Yang · 2024 · arXiv 2410.15458

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing

cs.CV · 2026-06-29 · unverdicted · novelty 7.0 · 2 refs

Goku provides a 2M-pair dataset for multi-task structural video editing, Goku-Edit model with MLLM and dual-branch design, and Goku-Bench yielding up to 8% gains in instruction following.

ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation

cs.CV · 2026-03-18 · unverdicted · novelty 7.0

ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.

Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering

cs.CV · 2026-06-15 · unverdicted · novelty 6.0

REINS uses supervised PCA on safety-labeled activations to find a linear direction that, when added to hidden states at roughly 50% depth in video diffusion transformers, redirects generations from unsafe to safe content across multiple models.

GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving

cs.DC · 2026-06-11 · unverdicted · novelty 6.0 · 2 refs

GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.

ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.

Latent Spatial Memory for Video World Models

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.

Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.

Latent-Compressed Variational Autoencoder for Video Diffusion Models

cs.CV · 2026-04-12 · unverdicted · novelty 6.0

A frequency-based latent compression method for video VAEs yields higher reconstruction quality than channel-reduction baselines at fixed compression ratios.

HunyuanVideo: A Systematic Framework For Large Video Generative Models

cs.CV · 2024-12-03 · unverdicted · novelty 5.0

HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.

Open-Sora Plan: Open-Source Large Video Generation Model

cs.CV · 2024-11-28 · unverdicted · novelty 4.0

Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.

citing papers explorer

Showing 10 of 10 citing papers.

Goku: A Million-Scale Universal Dataset and Benchmark for Instruction-Based Video Editing cs.CV · 2026-06-29 · unverdicted · none · ref 35 · 2 links
Goku provides a 2M-pair dataset for multi-task structural video editing, Goku-Edit model with MLLM and dual-branch design, and Goku-Bench yielding up to 8% gains in instruction following.
ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation cs.CV · 2026-03-18 · unverdicted · none · ref 84
ChopGrad truncates backpropagation to local frame windows in video diffusion models, reducing memory from linear in frame count to constant while enabling pixel-wise loss fine-tuning.
Pulling The REINS: Training-Free Safety Alignment of Video Diffusion Models via Representation Steering cs.CV · 2026-06-15 · unverdicted · none · ref 7
REINS uses supervised PCA on safety-labeled activations to find a linear direction that, when added to hidden states at roughly 50% depth in video diffusion transformers, redirects generations from unsafe to safe content across multiple models.
GF-DiT: Scheduling Parallelism for Diffusion Transformer Serving cs.DC · 2026-06-11 · unverdicted · none · ref 52 · 2 links
GF-DiT dynamically adapts parallelism during DiT serving via trajectory tasks and group-free collectives, reporting up to 6x throughput and 95% latency reduction versus static configurations.
ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation cs.CV · 2026-06-10 · unverdicted · none · ref 65
ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.
Latent Spatial Memory for Video World Models cs.CV · 2026-06-08 · unverdicted · none · ref 35
Mirage stores and queries 3D scene information in diffusion latent space via depth-guided lifting and warping, yielding 10.57× faster generation and 55× smaller memory than explicit RGB point-cloud baselines while reaching SOTA on WorldScore.
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation cs.CV · 2026-04-28 · unverdicted · none · ref 61
Mutual Forcing trains a single native autoregressive audio-video model with mutually reinforcing few-step and multi-step modes via self-distillation to match 50-step baselines at 4-8 steps.
Latent-Compressed Variational Autoencoder for Video Diffusion Models cs.CV · 2026-04-12 · unverdicted · none · ref 56
A frequency-based latent compression method for video VAEs yields higher reconstruction quality than channel-reduction baselines at fixed compression ratios.
HunyuanVideo: A Systematic Framework For Large Video Generative Models cs.CV · 2024-12-03 · unverdicted · none · ref 104
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.
Open-Sora Plan: Open-Source Large Video Generation Model cs.CV · 2024-11-28 · unverdicted · none · ref 29
Open-Sora Plan presents an open-source large video generation model that combines a Wavelet-Flow VAE, Joint Image-Video Skiparse Denoiser, and multi-dimensional data curation to achieve high-quality video outputs with public code and weights.

Allegro: Open the black box of commercial-level video generation model

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer