EM-Vid introduces an entity-centric latent patch memory bank with sparse token conditioning and budgeted updates for training-free consistent multi-shot video generation.
arXiv preprint arXiv:2510.18692 (2025) 5
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.
MuSS is a new movie-sourced dataset and benchmark that enables AI models to generate multi-shot videos with improved narrative coherence and subject identity preservation.
MiTA makes attention scalable by gathering query-aware top-k key-value pairs through landmarks as deformable routed experts and compressing the N-width fast-weight MLP into a shared narrower expert.
Long-CODE isolates long-context video evaluation with a new benchmark dataset and shot-dynamics metric that correlates better with human judgments on narrative richness and global consistency than short-video metrics.
Introduces CineDance-1M dataset for multi-shot long-form text-to-audio-video generation along with CineBench and a model adaptation.
citing papers explorer
-
Mixture-of-Top-k Attention: Efficient Attention via Scalable Fast Weights
MiTA makes attention scalable by gathering query-aware top-k key-value pairs through landmarks as deformable routed experts and compressing the N-width fast-weight MLP into a shared narrower expert.