pith. sign in

hub Canonical reference

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Canonical reference. 83% of citing Pith papers cite this work as background.

60 Pith papers citing it
Background 83% of classified citations
abstract

Streaming video generation, as one fundamental component in interactive world models and neural game engines, aims to generate high-quality, low-latency, and temporally coherent long video streams. However, most existing work suffers from severe error accumulation that often significantly degrades the generated stream videos over long horizons. We design Rolling Forcing, a novel video generation technique that enables streaming long videos with minimal error accumulation. Rolling Forcing comes with three novel designs. First, instead of iteratively sampling individual frames, which accelerates error propagation, we design a joint denoising scheme that simultaneously denoises multiple frames with progressively increasing noise levels. This design relaxes the strict causality across adjacent frames, effectively suppressing error growth. Second, we introduce the attention sink mechanism into the long-horizon stream video generation task, which allows the model to keep key value states of initial frames as a global context anchor and thereby enhances long-term global consistency. Third, we design an efficient training algorithm that enables few-step distillation over largely extended denoising windows. This algorithm operates on non-overlapping windows and mitigates exposure bias conditioned on self-generated histories. Extensive experiments show that Rolling Forcing enables real-time streaming generation of multi-minute videos on a single GPU, with substantially reduced error accumulation.

hub tools

citation-role summary

background 11 extension 1

citation-polarity summary

years

2026 56 2025 4

clear filters

representative citing papers

SwiftVR: Real-Time One-Step Generative Video Restoration

cs.CV · 2026-06-08 · unverdicted · novelty 7.0

SwiftVR achieves real-time generative video restoration at 1080p on consumer GPUs (26 FPS on RTX 5090) and higher resolutions on H100 via efficient dense attention and chunk-wise autoencoding.

AdaState: Self-Evolving Anchors for Streaming Video Generation

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

AdaState replaces the static first-frame KV anchor with an evolving hidden latent that the model denoises alongside content, treating time as relative to enable recurrence and richer dynamics in streaming video generation.

Q-ARVD: Quantizing Autoregressive Video Diffusion Models

cs.CV · 2026-05-20 · unverdicted · novelty 7.0

Q-ARVD introduces final-quality-aware frame weighting and outlier-aware adaptive dual-scale quantization to enable accurate low-bit inference for autoregressive video diffusion models.

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

DySink maintains a memory bank and retrieves relevant historical frames as dynamic sinks while using an anomaly gate to suppress collapse, yielding higher temporal quality and dynamic degree on minute-long videos.

Efficient Video Diffusion Models: Advancements and Challenges

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A survey that groups efficient video diffusion methods into four paradigms—step distillation, efficient attention, model compression, and cache/trajectory optimization—and outlines open challenges for practical use.

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 63 · internal anchor

    DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

  • AR Forcing: Towards Long-Horizon Robot Navigation World Model cs.RO · 2026-05-29 · unverdicted · none · ref 22 · internal anchor

    AR Forcing trains diffusion world models by integrating standard noise prediction loss into an autoregressive loop that uses self-generated predictions as context, reducing train-inference mismatch for improved long-horizon image consistency and trajectory accuracy on navigation datasets.