LongLive-RAG formulates long video generation as retrieval-augmented generation by treating self-generated latents as a dynamic searchable history and adding a Window Temporal Delta Loss for better retrieval.
Pretraining frame preservation in autoregressive video memory compression.arXiv preprint arXiv:2512.23851
12 Pith papers cite this work. Polarity classification is still indexing.
abstract
History context is central to autoregressive video generation, driving consistency and storytelling for both commercial models and personal use cases. For example, personal users, offline workflows, and individual-scale finetuning need to encode longer video histories under tight compute and memory budgets. We observe that content and identity consistency is an essential requirement, and that complete, uninterrupted history coverage together with content query and interpretation capabilities is broadly desired. We present TinyHistory, a lightweight history embedding learned through two-stage context learning. In the first stage, we pretrain the encoder on large-scale video data with a randomized frame query objective; in the second stage, we repurpose the pretrained encoder within an autoregressive video diffusion model to learn content-level consistency. As a result, we show that the learned lightweight embeddings achieve consistency comparable (by VLM, VBench, ELO, etc) to heavier alternatives, while reducing training overhead and extending the encodable history length within a given memory budget. We conduct ablation studies to analyze the influence and trade-offs of each component.
citation-role summary
citation-polarity summary
years
2026 12verdicts
UNVERDICTED 12representative citing papers
DySink maintains a memory bank and retrieves relevant historical frames as dynamic sinks while using an anomaly gate to suppress collapse, yielding higher temporal quality and dynamic degree on minute-long videos.
ABC enables any-subset autoregressive generation of continuous stochastic processes via non-Markovian diffusion bridges that track physical time and allow path-dependent conditioning.
A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.
Grounded Forcing introduces dual memory caching, reference-based positional embeddings, and proximity-weighted recaching to bridge stable semantics with local dynamics, improving long-range consistency in autoregressive video synthesis.
InteractiveAvatar is a real-time infinite-streaming avatar video generation system using autoregressive distillation, Long-Short Visual Memory for consistency, and a Reasoning-Reaction Module for intent-aware interactions.
Echo-Infinity replaces handcrafted KV-cache schedules with end-to-end optimized Memory Queries and a Unified Relative RoPE recipe to support real-time infinite video generation in diffusion transformers.
OmniMem enables scalable long video generation via adaptive sparse KV retrieval that addresses local bias and union explosion while preserving explicit historical access.
FlowLong generates videos several times longer than native model windows by blending adjacent predictions with Tweedie matching to enforce manifold and temporal consistency while using stochastic noise injection early and deterministic sampling later.
IAMFlow is a training-free identity-aware memory system that tracks entities via LLM global ID assignment and VLM frame verification to reduce identity drift in narrative long video generation from shifting prompts.
RAVEN aligns training and inference for causal autoregressive video diffusion via interleaved rollout repacking and introduces CM-GRPO for direct RL on consistency-model kernels, claiming better quality than recent baselines.
EverAnimate restores drifted latent flow trajectories in chunked video generation via persistent latent propagation and restorative flow matching, achieving measurable gains in PSNR, SSIM, LPIPS, and FID over prior long-animation methods with only LoRA tuning.
citing papers explorer
No citing papers match the current filters.