FadeMem introduces distance-aware KV memory consolidation for autoregressive video diffusion that builds a temporal hierarchy with power-law merging to preserve short-term dynamics and long-range coherence under fixed cache budget.
hub Baseline reference
VBench++: Comprehensive and versatile benchmark suite for video generative models.IEEE Transactions on Pattern Analysis and Machine Intelligence
Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.
hub tools
citation-role summary
citation-polarity summary
years
2026 16verdicts
UNVERDICTED 16representative citing papers
SPAWN enables training-free insertion of custom visual concepts into autoregressive world models by swapping the pinned context-memory anchor over a short injection window.
AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.
DirectorBench is a profile-aware diagnostic benchmark that localizes bottlenecks in long-form video generation workflows using structured checkpoints and multi-agent evaluation.
CutVerse benchmark evaluates GUI agents on 186 complex media post-production tasks in seven apps and reports 36% success rate for existing models.
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
ACT is a trajectory-conditioned framework for topology-general skeletal animation that injects 3D point trajectories from monocular video into skeletons via a Routed Trajectory Injector for improved fidelity and temporal consistency.
ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.
VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.
Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.
SwiftI2V achieves comparable 2K I2V quality to end-to-end models on VBench-I2V while cutting GPU time by 202x through low-resolution motion planning followed by strongly image-conditioned segment-wise high-resolution synthesis.
Purrsuasion is a negotiation game that surfaces satisficing and intent-attribution difficulties when students practice ethical data disclosure under real constraints.
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
A spectral framework for nonlinear DR uses spectral bases plus cross-entropy optimization to create multi-scale embeddings that preserve both global manifold geometry and local neighborhoods while supporting graph-frequency analysis.
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.
Focused Forcing is a training-free per-frame KV selection method that combines attention scores with diversity metrics and head-importance estimation to accelerate autoregressive video diffusion up to 1.48x while improving quality.
citing papers explorer
-
SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation
SwiftI2V achieves comparable 2K I2V quality to end-to-end models on VBench-I2V while cutting GPU time by 202x through low-resolution motion planning followed by strongly image-conditioned segment-wise high-resolution synthesis.
-
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
-
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.