hub Baseline reference

VBench++: Comprehensive and versatile benchmark suite for video generative models.IEEE Transactions on Pattern Analysis and Machine Intelligence

Ziqi Huang, Fan Zhang, Xiaojie Xu, Yinan He, Jiashuo Yu, Ziyue Dong, Qianli Ma, Nattapol Chanpaisit, Chenyang Si, Yuming Jiang, Yaohui Wang, Xinyuan Chen, Ying-Cong Chen, Limin Wang, Dahua Lin, Yu Qiao, Ziwei Liu · 2025 · arXiv 2025.363389

Baseline reference. 50% of citing Pith papers use this work as a benchmark or comparison.

16 Pith papers citing it

Baseline 50% of classified citations

read on arXiv browse 16 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 dataset 3

citation-polarity summary

background 3 use dataset 3

representative citing papers

FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion

cs.CV · 2026-06-09 · unverdicted · novelty 7.0

FadeMem introduces distance-aware KV memory consolidation for autoregressive video diffusion that builds a temporal hierarchy with power-law merging to preserve short-term dynamics and long-range coherence under fixed cache budget.

From Zero to Hero: Training-Free Custom Concept Spawning in World Models

cs.CV · 2026-06-01 · unverdicted · novelty 7.0

SPAWN enables training-free insertion of custom visual concepts into autoregressive world models by swapping the pinned context-memory anchor over a short injection window.

AdaState: Self-Evolving Anchors for Streaming Video Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.

DirectorBench: Diagnosing Long-Form Video Generation with Personalized Multi-Agent Evaluation

cs.CL · 2026-05-28 · unverdicted · novelty 7.0

DirectorBench is a profile-aware diagnostic benchmark that localizes bottlenecks in long-form video generation workflows using structured checkpoints and multi-agent evaluation.

CutVerse: A Compositional GUI Agents Benchmark for Media Post-Production Editing

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

CutVerse benchmark evaluates GUI agents on 186 complex media post-production tasks in seven apps and reports 36% success rate for existing models.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

Follow Your Track: Precise Skeleton Animation Controlled by 3D Trajectories

cs.CV · 2026-06-24 · unverdicted · novelty 6.0

ACT is a trajectory-conditioned framework for topology-general skeletal animation that injects 3D point trajectories from monocular video into skeletons via a Routed Trajectory Injector for improved fidelity and temporal consistency.

ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation

cs.CV · 2026-06-10 · unverdicted · novelty 6.0

ARGUS converts MLLM-selected identity evidence into a synchronized 3x3 mosaic injected as negative-time memory in a diffusion model, plus supporting training techniques, to achieve SOTA subject preservation on human video benchmarks.

VISTA: Triplet-Supervised Video Style Transfer with Diffusion Transformers

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

VISTA introduces a new synthetic triplet dataset and diffusion-transformer framework with style adapter that jointly models style, content, and motion to achieve state-of-the-art video style transfer.

Head Forcing: Long Autoregressive Video Generation via Head Heterogeneity

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Head Forcing assigns tailored KV cache strategies to local, anchor, and memory attention heads plus head-wise RoPE re-encoding to extend autoregressive video generation from seconds to minutes without training.

SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

SwiftI2V achieves comparable 2K I2V quality to end-to-end models on VBench-I2V while cutting GPU time by 202x through low-resolution motion planning followed by strongly image-conditioned segment-wise high-resolution synthesis.

Investigating Ethical Data Communication with Purrsuasion: An Educational Game about Negotiated Data Disclosure

cs.HC · 2026-04-06 · unverdicted · novelty 6.0

Purrsuasion is a negotiation game that surfaces satisficing and intent-attribution difficulties when students practice ethical data disclosure under real constraints.

Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.

A Spectral Framework for Multi-Scale Nonlinear Dimensionality Reduction

cs.LG · 2026-04-02 · unverdicted · novelty 6.0

A spectral framework for nonlinear DR uses spectral bases plus cross-entropy optimization to create multi-scale embeddings that preserve both global manifold geometry and local neighborhoods while supporting graph-frequency analysis.

Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion

cs.CV · 2026-02-08 · unverdicted · novelty 6.0

Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.

Focused Forcing: Content-Aware Per-Frame KV Selection for Efficient Autoregressive Video Diffusion

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Focused Forcing is a training-free per-frame KV selection method that combines attention scores with diversity metrics and head-importance estimation to accelerate autoregressive video diffusion up to 1.48x while improving quality.

citing papers explorer

Showing 3 of 3 citing papers after filters.

SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generation cs.CV · 2026-05-07 · unverdicted · none · ref 8
SwiftI2V achieves comparable 2K I2V quality to end-to-end models on VBench-I2V while cutting GPU time by 202x through low-resolution motion planning followed by strongly image-conditioned segment-wise high-resolution synthesis.
Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation cs.CV · 2026-04-03 · unverdicted · none · ref 18
Salt improves low-step video generation quality by adding endpoint-consistent regularization to distribution matching distillation and using cache-conditioned feature alignment for autoregressive models.
Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion cs.CV · 2026-02-08 · unverdicted · none · ref 41
Rolling Sink is a training-free cache adjustment technique that maintains visual consistency in autoregressive video diffusion models for ultra-long open-ended generation beyond training horizons.

VBench++: Comprehensive and versatile benchmark suite for video generative models.IEEE Transactions on Pattern Analysis and Machine Intelligence

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer