RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Jiankang Deng; Ronglai Zuo; Yanzuo Lu

arxiv: 2605.15190 · v1 · pith:OSSELZHAnew · submitted 2026-05-14 · 💻 cs.CV

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Yanzuo Lu , Ronglai Zuo , Jiankang Deng This is my paper

Pith reviewed 2026-06-30 20:35 UTC · model grok-4.3

classification 💻 cs.CV

keywords autoregressive video generationcausal video diffusionconsistency modelsreinforcement learningreal-time extrapolationself-rollout repackingdistribution alignment

0 comments

The pith

RAVEN aligns training attention with inference-time extrapolation in causal video diffusion by repacking self-rollouts into interleaved clean and noisy sequences.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the persistent gap between training history distributions and inference-time histories limits quality in causal autoregressive video diffusion models over long horizons. RAVEN closes this gap by repacking each self-rollout into an interleaved sequence of clean historical endpoints and noisy denoising states, allowing chunk losses to directly supervise the history representations used for future predictions. This alignment supports real-time streaming generation where future chunks are extrapolated from previously generated content. The work also introduces CM-GRPO, which casts consistency sampling as a conditional Gaussian transition to enable direct online reinforcement learning on the kernel. Experiments show that RAVEN outperforms recent causal video distillation baselines on quality, semantic, and dynamic degree metrics, with additional gains when combined with CM-GRPO.

Core claim

RAVEN is a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. CM-GRPO reformulates a consistency sampling step as a conditional Gaussian transition and applies online RL directly to this kernel.

What carries the argument

Repacking of self-rollouts into interleaved sequences of clean historical endpoints and noisy denoising states, which aligns training attention patterns with those at inference.

If this is right

RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations.
CM-GRPO provides further performance gains when combined with RAVEN.
The method enables higher-quality real-time streaming video generation by extrapolating future chunks from generated history.
Chunk losses can now supervise history representations that future predictions depend on.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The interleaving idea could be tested in autoregressive generation tasks outside video, such as audio or text sequences, to reduce similar training-inference mismatches.
The framework might scale to longer video horizons where distribution shift typically grows most severe.
Direct RL on the consistency kernel might combine with other sampling accelerations beyond the reported experiments.

Load-bearing premise

Repacking self-rollouts into interleaved clean and noisy sequences aligns training attention with inference-time extrapolation without introducing new distribution shifts or training instabilities that offset the gains.

What would settle it

An ablation that removes or randomizes the interleaving step during training and measures whether the reported gains in long-horizon quality, semantic consistency, and dynamic degree disappear.

Figures

Figures reproduced from arXiv: 2605.15190 by Jiankang Deng, Ronglai Zuo, Yanzuo Lu.

**Figure 1.** Figure 1: Attention Mask Configuration. Autoregressive video diffusion training paradigms differ in how historical states enter attention and whether those states receive end-to-end supervision from later chunks. Teacher Forcing and Diffusion Forcing rely on data-driven historical states, inducing a training distribution that differs from inference. Self Forcing shifts the history distribution toward inference but r… view at source ↗

**Figure 2.** Figure 2: Training Pipeline. RAVEN builds on score distillation with a training-time test formulation that aligns the generator’s training context with inference. In the fake-score step, the frozen generator performs autoregressive self rollout with KV cache reuse, producing the clean endpoints and noisy denoising states that are subsequently reused in the generator step. Rather than discarding these rollout states … view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons. See supplementary for playble video clips. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative ablation on Training-time Test. See supplementary for playable video clips. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation on Chunk-wise Loss Scaling [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: User study preference rates on Quality, Semantic, and Overall. B More Implementation Details Dataset. Both RAVEN and CM-GRPO are trained exclusively on text prompts drawn from VidProM [124], preprocessed through filtering and large language model extension, following the data protocol of Self Forcing [31]. Ablation experiments that require real video data draw from OpenVidHD-0.4M [125], with all video cli… view at source ↗

**Figure 7.** Figure 7: User study instruction screenshot. D Discussion Although RAVEN and CM-GRPO are presented with concrete design choices tailored to causal autoregressive video distillation, both formulations admit broader scope than the setting evaluated in our experiments. The interleaved sequence construction underlying RAVEN currently treats clean chunks as historical context, yet the supervised forward pass does not res… view at source ↗

read the original abstract

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self rollout into an interleaved sequence of clean historical endpoints and noisy denoising states. This formulation aligns training attention with inference-time extrapolation and allows downstream chunk losses to supervise the history representations on which future predictions depend. We further propose Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online Reinforcement Learning (RL) directly to this kernel, avoiding the Euler-Maruyama auxiliary process adopted in prior flow-model RL formulations. Experiments demonstrate that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, and that CM-GRPO provides further gains when combined with RAVEN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAVEN's interleaved rollout repacking and CM-GRPO target the training-inference gap in causal video models but rest on unshown distribution matching and lack any metrics or derivations.

read the letter

The two concrete moves here are repacking each self-rollout into an interleaved sequence of clean historical endpoints and noisy states, plus reformulating consistency sampling as a conditional Gaussian so RL can be applied directly without the usual Euler-Maruyama step.

These choices address a documented mismatch between training history distributions and inference-time extrapolation in autoregressive video diffusion. The chunk-loss supervision on history representations follows logically from that framing, and skipping the auxiliary process in the RL formulation is a clear simplification over prior flow-model work.

The abstract supplies no numbers, no dataset names, no ablation tables, and no equations showing that the interleaved joint distribution actually matches inference trajectories. The stress-test concern about new distribution shifts or attention instabilities therefore stands, because nothing in the text rules it out or quantifies the offset. Without those checks the reported gains on quality, semantic, and dynamic metrics cannot be evaluated.

The work is aimed at researchers already building few-step causal video generators for streaming or interactive use. A reader who needs a practical fix for long-horizon degradation might extract the repacking idea and test it themselves; the GRPO reformulation could be useful if the conditional Gaussian step preserves the required properties.

The paper deserves a serious referee to see the missing derivations, training details, and experimental controls. The central claim is testable once those are supplied, even if the current text leaves the alignment benefit unproven.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a training-time test framework that repacks each self-rollout into an interleaved sequence of clean historical endpoints and noisy denoising states to align training attention with inference-time extrapolation in causal autoregressive video diffusion models. It further proposes Consistency-model Group Relative Policy Optimization (CM-GRPO), which reformulates a consistency sampling step as a conditional Gaussian transition and applies online RL directly to this kernel, avoiding the Euler-Maruyama auxiliary process. Experiments are asserted to show that RAVEN surpasses recent causal video distillation baselines across quality, semantic, and dynamic degree evaluations, with additional gains when combined with CM-GRPO.

Significance. If validated, the work could advance real-time streaming video generation by mitigating history distribution gaps that limit long-horizon quality in autoregressive models. The CM-GRPO reformulation offers a direct RL approach on consistency kernels, which is a technical distinction from prior flow-model RL methods. No machine-checked proofs or reproducible code are mentioned, but the focus on aligning training and inference distributions is a relevant direction if the alignment is rigorously shown.

major comments (2)

[RAVEN description] The RAVEN formulation asserts that repacking self-rollouts into interleaved sequences of clean historical endpoints and noisy denoising states aligns training attention with inference-time extrapolation and enables chunk losses to supervise history representations, but provides no derivation showing that the resulting joint distribution over (clean, noisy) pairs matches inference trajectories. This is load-bearing for the central claim, as unanalyzed distribution shifts from the interleaving could offset the reported gains rather than achieve the intended alignment.
[Experiments] The abstract asserts experimental superiority over causal video distillation baselines on quality, semantic, and dynamic degree evaluations but supplies no metrics, dataset details, ablation studies, or implementation specifics. This absence prevents verification of the magnitude or reliability of the claimed improvements and of whether CM-GRPO provides further gains.

minor comments (1)

The acronym 'CM-GRPO' is expanded as 'Consistency-model Group Relative Policy Optimization' in the text; ensure the title's 'Consistency-model GRPO' is consistent or clarified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below.

read point-by-point responses

Referee: [RAVEN description] The RAVEN formulation asserts that repacking self-rollouts into interleaved sequences of clean historical endpoints and noisy denoising states aligns training attention with inference-time extrapolation and enables chunk losses to supervise history representations, but provides no derivation showing that the resulting joint distribution over (clean, noisy) pairs matches inference trajectories. This is load-bearing for the central claim, as unanalyzed distribution shifts from the interleaving could offset the reported gains rather than achieve the intended alignment.

Authors: We agree with the referee that a formal derivation is absent from the current manuscript. The interleaving strategy is motivated by the need to match the attention patterns, but without an explicit proof that the joint distribution is preserved, the alignment claim remains heuristic. We will add a detailed derivation in the revised version demonstrating that the repacked training distribution matches the inference trajectories under the causal autoregressive setting. revision: yes
Referee: [Experiments] The abstract asserts experimental superiority over causal video distillation baselines on quality, semantic, and dynamic degree evaluations but supplies no metrics, dataset details, ablation studies, or implementation specifics. This absence prevents verification of the magnitude or reliability of the claimed improvements and of whether CM-GRPO provides further gains.

Authors: The referee is correct that neither the abstract nor the provided manuscript text includes specific metrics, dataset details, or ablations. We will revise the manuscript to include a full Experiments section with quantitative results (e.g., specific scores on quality metrics), details on the datasets used, ablation studies isolating the contributions of RAVEN and CM-GRPO, and implementation specifics to substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: method claims rest on explicit reformulations without reduction to inputs

full rationale

The provided abstract and description contain no equations, fitted parameters renamed as predictions, or self-citations that bear the central load. RAVEN's repacking of self-rollouts and CM-GRPO's reformulation of consistency sampling as a conditional Gaussian are presented as design choices whose alignment benefits are asserted via experiment, not derived by construction from the inputs themselves. No step reduces a claimed result to a tautology or prior self-work that is itself unverified. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities are described or can be extracted.

pith-pipeline@v0.9.1-grok · 5731 in / 1175 out tokens · 24258 ms · 2026-06-30T20:35:55.465619+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold
cs.CV 2026-06 unverdicted novelty 6.0

MoVerse generates real-time interactive video world models from single narrow-FOV images via panoramic diffusion expansion, Gaussian scaffold lifting, and distillation of a bidirectional diffusion teacher into a causa...

Reference graph

Works this paper leans on

131 extracted references · 69 canonical work pages · cited by 1 Pith paper · 26 internal anchors

[1]

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev, Alexey Letunovskiy, Nikolai Vaulin, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Nikita Kiselev, et al. Kandinsky 5.0: A family of foundation models for image and video generation.arXiv preprint arXiv:2511.14993, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Sana-video: Efficient video generation with block linear diffusion transformer

Junsong Chen, Yuyang Zhao, Jincheng Yu, Ruihang Chu, Junyu Chen, Shuai Yang, Xianbang Wang, Yicheng Pan, Daquan Zhou, Huan Ling, Haozhe Liu, Hongwei Yi, et al. Sana-video: Efficient video generation with block linear diffusion transformer. InICLR, 2026

2026
[3]

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, Xunsong Li, Yifu Li, et al. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[4]

Mochi 1: Ai video generator

Genmo. Mochi 1: Ai video generator. https://www.genmo.ai/blog/mochi-1-a-new-sota-in-o pen-text-to-video, 2024

2024
[5]

LTX-Video: Realtime Video Latent Diffusion

Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, et al. Ltx-video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

LTX-2: Efficient Joint Audio-Visual Foundation Model

Yoav HaCohen, Benny Brazowski, Nisan Chiprut, Yaki Bitterman, Andrew Kvochko, Avishai Berkowitz, Daniel Shalem, Daphna Lifschitz, Dudu Moshe, Eitan Porat, Eitan Richardson, Guy Shiran, et al. Ltx-2: Efficient joint audio-visual foundation model.arXiv preprint arXiv:2601.03233, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[7]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, et al. Step-video-t2v technical report: The practice, challenges, and future of video foundation model.arXiv preprint arXiv:2502.10248, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[9]

Movie gen: A cast of media foundation models

Meta. Movie gen: A cast of media foundation models. https://ai.meta.com/static-resource/ movie-gen-research-paper, 2024

2024
[10]

Cosmos world foundation model platform for physical ai

NVIDIA. Cosmos world foundation model platform for physical ai. https://research.nvidia.co m/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai, 2025

2025
[11]

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Team Seedance, Heyi Chen, Siyan Chen, Xin Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Tianheng Cheng, Xinqi Cheng, Xuyan Chi, Jian Cong, et al. Seedance 1.5 pro: A native audio-visual joint generation foundation model.arXiv preprint arXiv:2512.13507, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

Seedance 2.0: Advancing Video Generation for World Complexity

Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, Mojie Chi, Xuyan Chi, et al. Seedance 2.0: Advancing video generation for world complexity.arXiv preprint arXiv:2604.14148, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[13]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[14]

HunyuanVideo 1.5 Technical Report

Bing Wu, Chang Zou, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Jack Peng, Jianbing Wu, Jiangfeng Xiong, Jie Jiang, Linus, Patrol, et al. Hunyuanvideo 1.5 technical report.arXiv preprint arXiv:2511.18870, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Cogvideox: Text-to-video diffusion models with an expert transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, et al. Cogvideox: Text-to-video diffusion models with an expert transformer. InICLR, 2025. 10

2025
[16]

Sand ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang, et al. Magi-1: Autoregressive video generation at scale.arXiv preprint arXiv:2505.13211, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

Causality in video diffusers is separable from denoising.arXiv preprint arXiv:2602.10095, 2026

Xingjian Bai, Guande He, Zhengqi Li, Eli Shechtman, Xun Huang, and Zongze Wu. Causality in video diffusers is separable from denoising.arXiv preprint arXiv:2602.10095, 2026

work page arXiv 2026
[18]

SkyReels-V2: Infinite-length Film Generative Model

Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, Weiming Xiong, Wei Wang, et al. Skyreels-v2: Infinite-length film generative model.arXiv preprint arXiv:2504.13074, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[19]

Autoregressive video generation without vector quantization

Haoge Deng, Ting Pan, Haiwen Diao, Zhengxiong Luo, Yufeng Cui, Huchuan Lu, Shiguang Shan, Yonggang Qi, and Xinlong Wang. Autoregressive video generation without vector quantization. InICLR, 2025

2025
[20]

End-to-end training for autoregressive video diffusion via self-resampling.arXiv preprint arXiv:2512.15702, 2025

Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, and Dahua Lin. End-to-end training for autoregressive video diffusion via self-resampling.arXiv preprint arXiv:2512.15702, 2025

work page arXiv 2025
[21]

Live: Long-horizon interactive video world modeling,

Junchao Huang, Ziyang Ye, Xinting Hu, Tianyu He, Guiyu Zhang, Shaoshuai Shi, Jiang Bian, and Li Jiang. Live: Long-horizon interactive video world modeling.arXiv preprint arXiv:2602.03747, 2026

work page arXiv 2026
[22]

Pyramidal flow matching for efficient video generative modeling

Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling. InICLR, 2025

2025
[23]

Stable video infinity: Infinite- length video generation with error recycling

Wuyang Li, Wentao Pan, Po-Chien Luan, Yang Gao, and Alexandre Alahi. Stable video infinity: Infinite- length video generation with error recycling. InICLR, 2026

2026
[24]

Infinitystar: Unified spacetime autoregressive modeling for visual generation

Jinlai Liu, Jian Han, Bin Yan, Hui Wu, Fengda Zhu, Xing Wang, Yi Jiang, Bingyue Peng, and Zehuan Yuan. Infinitystar: Unified spacetime autoregressive modeling for visual generation. InNeurIPS, 2025

2025
[25]

Bagger: Backwards aggregation for mitigating drift in autoregressive video diffusion models.arXiv preprint arXiv:2512.12080, 2025

Ryan Po, Eric Ryan Chan, Changan Chen, and Gordon Wetzstein. Bagger: Backwards aggregation for mitigating drift in autoregressive video diffusion models.arXiv preprint arXiv:2512.12080, 2025

work page arXiv 2025
[26]

Pack and force your memory: Long-form and consistent video generation.arXiv preprint arXiv:2510.01784, 2025

Xiaofei Wu, Guozhen Zhang, Zhiyong Xu, Yuan Zhou, Qinglin Lu, and Xuming He. Pack and force your memory: Long-form and consistent video generation.arXiv preprint arXiv:2510.01784, 2025

work page arXiv 2025
[27]

Macro-from-micro planning for high-quality and parallelized autoregressive long video generation.arXiv preprint arXiv:2508.03334, 2025

Xunzhi Xiang, Yabo Chen, Guiyu Zhang, Zhongyu Wang, Zhe Gao, Quanming Xiang, Gonghu Shang, Junqi Liu, Haibin Huang, Yang Gao, Chi Zhang, Qi Fan, et al. Macro-from-micro planning for high-quality and parallelized autoregressive long video generation.arXiv preprint arXiv:2508.03334, 2025

work page arXiv 2025
[28]

Helios: Real real-time long video generation model.arXiv preprint arXiv:2603.04379, 2026

Shenghai Yuan, Yuanyang Yin, Zongjian Li, Xinwei Huang, Xiao Yang, and Li Yuan. Helios: Real real-time long video generation model.arXiv preprint arXiv:2603.04379, 2026

work page arXiv 2026
[29]

BIFE: Better Interaction, Fewer Errors for Minute-Long Video Generation

Zeyu Zhang, Shuning Chang, Yuanyu He, Yizeng Han, Jiasheng Tang, Fan Wang, and Bohan Zhuang. Blockvid: Block diffusion for high-quality and consistent minute-long video generation.arXiv preprint arXiv:2511.22973, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

TinyHistory: Lightweight Video History Embeddings via Two-Stage Context Learning

Lvmin Zhang, Shengqu Cai, Muyang Li, Chong Zeng, Beijia Lu, Anyi Rao, Song Han, Gordon Wetzstein, and Maneesh Agrawala. Pretraining frame preservation for lightweight autoregressive video history embedding.arXiv preprint arXiv:2512.23851, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[31]

Self forcing: Bridging the train-test gap in autoregressive video diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion. InNeurIPS, 2025

2025
[32]

Rolling forcing: Autoregressive long video diffusion in real time

Kunhao Liu, Wenbo Hu, Jiale Xu, Ying Shan, and Shijian Lu. Rolling forcing: Autoregressive long video diffusion in real time. InICLR, 2026

2026
[33]

Reward forcing: Efficient streaming video generation with rewarded distribution matching distillation

Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, and Min Zhang. Reward forcing: Efficient streaming video generation with rewarded distribution matching distillation. InCVPR, 2026

2026
[34]

Longlive: Real-time interactive long video generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, and Yukang Chen. Longlive: Real-time interactive long video generation. InICLR, 2026

2026
[35]

Freeman, Fredo Durand, Eli Shechtman, and Xun Huang

Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InCVPR, 2025. 11

2025
[36]

Causal forcing: Autore- gressive diffusion distillation done right for high-quality real-time interactive video generation

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, and Jun Zhu. Causal forcing: Autore- gressive diffusion distillation done right for high-quality real-time interactive video generation. InICML, 2026

2026
[37]

Diffusion forcing: Next-token prediction meets full-sequence diffusion

Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, and Vincent Sitzmann. Diffusion forcing: Next-token prediction meets full-sequence diffusion. InNeurIPS, 2024

2024
[38]

History- guided video diffusion

Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, and Vincent Sitzmann. History- guided video diffusion. InICML, 2025

2025
[39]

Freeman, and Taesung Park

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InCVPR, pages 6613–6623, 2024

2024
[40]

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T. Freeman. Improved distribution matching distillation for fast image synthesis. InNeurIPS, 2024

2024
[41]

Flow-grpo: Training flow matching models via online rl

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl. InNeurIPS, 2025

2025
[42]

Context forcing: Consistent autoregressive video generation with long context.arXiv preprint arXiv:2602.06028, 2026

Shuo Chen, Cong Wei, Sun Sun, Ping Nie, Kai Zhou, Ge Zhang, Ming-Hsuan Yang, and Wenhu Chen. Context forcing: Consistent autoregressive video generation with long context.arXiv preprint arXiv:2602.06028, 2026

work page arXiv 2026
[43]

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, and Cho-Jui Hsieh. Self-forcing++: Towards minute-scale high-quality video generation.arXiv preprint arXiv:2510.02283, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[44]

Streaming autoregressive video generation via diagonal distillation

Jinxiu Liu, Xuanming Liu, Kangfu Mei, Yandong Wen, Ming-Hsuan Yang, and Weiyang Liu. Streaming autoregressive video generation via diagonal distillation. InICLR, 2026

2026
[45]

Hiar: Efficient autoregressive long video generation via hierarchical denoising.arXiv preprint arXiv:2603.08703, 2026

Kai Zou, Dian Zheng, Hongbo Liu, Tiankai Hang, Bin Liu, and Nenghai Yu. Hiar: Efficient autoregressive long video generation via hierarchical denoising.arXiv preprint arXiv:2603.08703, 2026

work page arXiv 2026
[46]

Past- and future-informed kv cache policy with salience estimation in autoregressive video diffusion.arXiv preprint arXiv:2601.21896, 2026

Hanmo Chen, Chenghao Xu, Xu Yang, Xuan Chen, and Cheng Deng. Past- and future-informed kv cache policy with salience estimation in autoregressive video diffusion.arXiv preprint arXiv:2601.21896, 2026

work page arXiv 2026
[47]

Memflow: Flowing adaptive memory for consistent and efficient long video narratives.arXiv preprint arXiv:2512.14699, 2025

Sihui Ji, Xi Chen, Shuai Yang, Xin Tao, Pengfei Wan, and Hengshuang Zhao. Memflow: Flowing adaptive memory for consistent and efficient long video narratives.arXiv preprint arXiv:2512.14699, 2025

work page arXiv 2025
[48]

Motionstream: Real-time video generation with interactive motion controls

Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, and Xun Huang. Motionstream: Real-time video generation with interactive motion controls. InICLR, 2026

2026
[49]

Videossm: Autoregressive long video gen- eration with hybrid state-space memory.arXiv preprint arXiv:2512.04519, 2025

Yifei Yu, Xiaoshan Wu, Xinting Hu, Tao Hu, Yangtian Sun, Xiaoyang Lyu, Bo Wang, Lin Ma, Yuewen Ma, Zhongrui Wang, and Xiaojuan Qi. Videossm: Autoregressive long video generation with hybrid state-space memory.arXiv preprint arXiv:2512.04519, 2025

work page arXiv 2025
[50]

Memorize-and-generate: Towards long-term consistency in real-time video generation.arXiv preprint arXiv:2512.18741, 2025

Tianrui Zhu, Shiyi Zhang, Zhirui Sun, Jingqi Tian, and Yansong Tang. Memorize-and-generate: Towards long-term consistency in real-time video generation.arXiv preprint arXiv:2512.18741, 2025

work page arXiv 2025
[51]

Lol: Longer than longer, scaling video generation to hour.arXiv preprint arXiv:2601.16914, 2026

Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, and Cho-Jui Hsieh. Lol: Longer than longer, scaling video generation to hour.arXiv preprint arXiv:2601.16914, 2026

work page arXiv 2026
[52]

Train short, inference long: Training-free horizon extension for autoregressive video generation.arXiv preprint arXiv:2602.14027, 2026

Jia Li, Xiaomeng Fu, Xurui Peng, Weifeng Chen, Youwei Zheng, Tianyu Zhao, Jiexi Wang, Fangmin Chen, Xing Wang, and Hayden Kwok-Hay So. Train short, inference long: Training-free horizon extension for autoregressive video generation.arXiv preprint arXiv:2602.14027, 2026

work page arXiv 2026
[53]

Pathwise test-time correction for autoregressive long video generation.arXiv preprint arXiv:2602.05871, 2026

Xunzhi Xiang, Zixuan Duan, Guiyu Zhang, Haiyu Zhang, Zhe Gao, Junta Wu, Shaofeng Zhang, Tengfei Wang, Qi Fan, and Chunchao Guo. Pathwise test-time correction for autoregressive long video generation. arXiv preprint arXiv:2602.05871, 2026

work page arXiv 2026
[54]

Infinity-rope: Action-controllable infinite video generation emerges from autoregressive self-rollout

Hidir Yesiltepe, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, and Pinar Yanardag. Infinity-rope: Action-controllable infinite video generation emerges from autoregressive self-rollout. InCVPR, 2026

2026
[55]

arXiv preprint arXiv:2512.05081 (2025)

Jung Yi, Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, and Seungryong Kim. Deep forcing: Training-free long video generation with deep sink and participative compression.arXiv preprint arXiv:2512.05081, 2025. 12

work page arXiv 2025
[56]

Relax forcing: Relaxed kv-memory for consistent long video generation.arXiv preprint arXiv:2603.21366, 2026

Zengqun Zhao, Yanzuo Lu, Ziquan Liu, Jifei Song, Jiankang Deng, and Ioannis Patras. Relax forcing: Relaxed kv-memory for consistent long video generation.arXiv preprint arXiv:2603.21366, 2026

work page arXiv 2026
[57]

Training diffusion models with reinforcement learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning. InICLR, 2024

2024
[58]

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Pref-grpo: Pairwise preference reward-based grpo for stable text-to-image reinforcement learning.arXiv preprint arXiv:2508.20751, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[59]

Imagereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. InNeurIPS, 2023

2023
[60]

Diffusionnft: Online diffusion reinforcement with forward process

Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process. InICLR, 2026

2026
[61]

Stage: Stable and generalizable grpo for autoregressive image generation.arXiv preprint arXiv:2509.25027, 2025

Xiaoxiao Ma, Haibo Qiu, Guohui Zhang, Zhixiong Zeng, Siqi Yang, Lin Ma, and Feng Zhao. Stage: Stable and generalizable grpo for autoregressive image generation.arXiv preprint arXiv:2509.25027, 2025

work page arXiv 2025
[62]

Worldcompass: Reinforce- ment learning for long-horizon world models,

Zehan Wang, Tengfei Wang, Haiyu Zhang, Xuhui Zuo, Junta Wu, Haoyuan Wang, Wenqiang Sun, Zhenwei Wang, Chenjie Cao, Hengshuang Zhao, Chunchao Guo, and Zhou Zhao. Worldcompass: Reinforcement learning for long-horizon world models.arXiv preprint arXiv:2602.09022, 2026

work page arXiv 2026
[63]

Rlvr-world: Training world models with reinforcement learning

Jialong Wu, Shaofeng Yin, Ningya Feng, and Mingsheng Long. Rlvr-world: Training world models with reinforcement learning. InICLR, 2025

2025
[64]

Reinforcement learning with inverse rewards for world model post-training.arXiv preprint arXiv:2509.23958, 2025

Yang Ye, Tianyu He, Shuo Yang, and Jiang Bian. Reinforcement learning with inverse rewards for world model post-training.arXiv preprint arXiv:2509.23958, 2025

work page arXiv 2025
[65]

Ma, Haoyang Huang, Nan Duan, and Anyi Rao

Songchun Zhang, Zeyue Xue, Siming Fu, Jie Huang, Xianghao Kong, Y . Ma, Haoyang Huang, Nan Duan, and Anyi Rao. Astrolabe: Steering forward-process reinforcement learning for distilled autoregressive video models.arXiv preprint arXiv:2603.17051, 2026

work page arXiv 2026
[66]

Real-time motion-controllable autoregressive video diffusion

Kesen Zhao, Jiaxin Shi, Beier Zhu, Junbao Zhou, Xiaolong Shen, Yuan Zhou, Qianru Sun, and Hanwang Zhang. Real-time motion-controllable autoregressive video diffusion. InICLR, 2026

2026
[67]

Flash-dmd: Towards high-fidelity few-step image generation with efficient distillation and joint reinforcement learning.arXiv preprint arXiv:2511.20549, 2025

Guanjie Chen, Shirui Huang, Kai Liu, Jianchen Zhu, Xiaoye Qu, Peng Chen, Yu Cheng, and Yifu Sun. Flash-dmd: Towards high-fidelity few-step image generation with efficient distillation and joint reinforcement learning.arXiv preprint arXiv:2511.20549, 2025

work page arXiv 2025
[68]

Erudiff: Refactoring knowledge in diffusion models for advanced text-to-image synthesis.arXiv preprint arXiv:2603.20828, 2026

Xiefan Guo, Xinzhu Ma, Haoxiang Ma, Zihao Zhou, and Di Huang. Erudiff: Refactoring knowledge in diffusion models for advanced text-to-image synthesis.arXiv preprint arXiv:2603.20828, 2026

work page arXiv 2026
[69]

Tdm-r1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700, 2026

Yihong Luo, Tianyang Hu, Weijian Luo, and Jing Tang. Tdm-r1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700, 2026

work page arXiv 2026
[70]

Gardo: Reinforcing diffusion models without reward hacking.arXiv preprint arXiv:2512.24138, 2025

Haoran He, Yuxiao Ye, Jie Liu, Jiajun Liang, Zhiyong Wang, Ziyang Yuan, Xintao Wang, Hangyu Mao, Pengfei Wan, and Ling Pan. Gardo: Reinforcing diffusion models without reward hacking.arXiv preprint arXiv:2512.24138, 2025

work page arXiv 2025
[71]

Unigrpo: Unified policy optimization for reasoning-driven visual generation.arXiv preprint arXiv:2603.23500, 2026

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, and Wanli Ouyang. Unigrpo: Unified policy optimization for reasoning-driven visual generation.arXiv preprint arXiv:2603.23500, 2026

work page arXiv 2026
[72]

Data-regularized reinforcement learning for diffusion models at scale.arXiv preprint arXiv:2512.04332, 2025

Haotian Ye, Kaiwen Zheng, Jiashu Xu, Puheng Li, Huayu Chen, Jiaqi Han, Sheng Liu, Qinsheng Zhang, Hanzi Mao, Zekun Hao, Prithvijit Chattopadhyay, Dinghao Yang, et al. Data-regularized reinforcement learning for diffusion models at scale.arXiv preprint arXiv:2512.04332, 2025

work page arXiv 2025
[73]

Diffusion reinforcement learning via centered reward distillation.arXiv preprint arXiv:2603.14128, 2026

Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, and Vicky Kalogeiton. Diffusion reinforcement learning via centered reward distillation.arXiv preprint arXiv:2603.14128, 2026

work page arXiv 2026
[74]

Neighbor grpo: Contrastive ode policy optimization aligns flow models

Dailan He, Guanlin Feng, Xingtong Ge, Yazhe Niu, Yi Zhang, Bingqi Ma, Guanglu Song, Yu Liu, and Hongsheng Li. Neighbor grpo: Contrastive ode policy optimization aligns flow models. InCVPR, 2026

2026
[75]

Reinforcing diffusion models by direct group preference optimization

Yihong Luo, Tianyang Hu, and Jing Tang. Reinforcing diffusion models by direct group preference optimization. InICLR, 2026. 13

2026
[76]

Yao, and Wenpin Tang

Jiayuan Sheng, Hanyang Zhao, Haoxian Chen, David D. Yao, and Wenpin Tang. Understanding sampler stochasticity in training diffusion models for rlhf.arXiv preprint arXiv:2510.10767, 2025

work page arXiv 2025
[77]

Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang

Feng Wang and Zihao Yu. Coefficients-preserving sampling for reinforcement learning with flow matching. arXiv preprint arXiv:2509.05952, 2025

work page arXiv 2025
[78]

Pc-flow: Preference alignment in flow matching via classifier

Shaomeng Wang, He Wang, Longquan Dai, and Jinhui Tang. Pc-flow: Preference alignment in flow matching via classifier. InAAAI, 2026

2026
[79]

E-grpo: High entropy steps drive effective reinforcement learning for flow models

Shengjun Zhang, Zhang Zhang, Chensheng Dai, and Yueqi Duan. E-grpo: High entropy steps drive effective reinforcement learning for flow models. InCVPR Findings, 2026

2026
[80]

Manifold-aware exploration for reinforcement learning in video generation.arXiv preprint arXiv:2603.21872, 2026

Mingzhe Zheng, Weijie Kong, Yue Wu, Dengyang Jiang, Yue Ma, Xuanhua He, Bin Lin, Kaixiong Gong, Zhao Zhong, Liefeng Bo, Qifeng Chen, and Harry Yang. Manifold-aware exploration for reinforcement learning in video generation.arXiv preprint arXiv:2603.21872, 2026

work page arXiv 2026

Showing first 80 references.

[1] [1]

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Vladimir Arkhipkin, Vladimir Korviakov, Nikolai Gerasimenko, Denis Parkhomenko, Viacheslav Vasilev, Alexey Letunovskiy, Nikolai Vaulin, Maria Kovaleva, Ivan Kirillov, Lev Novitskiy, Denis Koposov, Nikita Kiselev, et al. Kandinsky 5.0: A family of foundation models for image and video generation.arXiv preprint arXiv:2511.14993, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Sana-video: Efficient video generation with block linear diffusion transformer

Junsong Chen, Yuyang Zhao, Jincheng Yu, Ruihang Chu, Junyu Chen, Shuai Yang, Xianbang Wang, Yicheng Pan, Daquan Zhou, Huan Ling, Haozhe Liu, Hongwei Yi, et al. Sana-video: Efficient video generation with block linear diffusion transformer. InICLR, 2026

2026

[3] [3]

Seedance 1.0: Exploring the Boundaries of Video Generation Models

Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang, Fangyuan Kong, Huixia Li, Jiashi Li, Liang Li, Xiaojie Li, Xunsong Li, Yifu Li, et al. Seedance 1.0: Exploring the boundaries of video generation models.arXiv preprint arXiv:2506.09113, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[4] [4]

Mochi 1: Ai video generator

Genmo. Mochi 1: Ai video generator. https://www.genmo.ai/blog/mochi-1-a-new-sota-in-o pen-text-to-video, 2024

2024

[5] [5]

LTX-Video: Realtime Video Latent Diffusion

Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richardson, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, et al. Ltx-video: Realtime video latent diffusion.arXiv preprint arXiv:2501.00103, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[6] [6]

LTX-2: Efficient Joint Audio-Visual Foundation Model

Yoav HaCohen, Benny Brazowski, Nisan Chiprut, Yaki Bitterman, Andrew Kvochko, Avishai Berkowitz, Daniel Shalem, Daphna Lifschitz, Dudu Moshe, Eitan Porat, Eitan Richardson, Guy Shiran, et al. Ltx-2: Efficient joint audio-visual foundation model.arXiv preprint arXiv:2601.03233, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[7] [7]

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[8] [8]

Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, et al. Step-video-t2v technical report: The practice, challenges, and future of video foundation model.arXiv preprint arXiv:2502.10248, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[9] [9]

Movie gen: A cast of media foundation models

Meta. Movie gen: A cast of media foundation models. https://ai.meta.com/static-resource/ movie-gen-research-paper, 2024

2024

[10] [10]

Cosmos world foundation model platform for physical ai

NVIDIA. Cosmos world foundation model platform for physical ai. https://research.nvidia.co m/publication/2025-01_cosmos-world-foundation-model-platform-physical-ai, 2025

2025

[11] [11]

Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model

Team Seedance, Heyi Chen, Siyan Chen, Xin Chen, Yanfei Chen, Ying Chen, Zhuo Chen, Feng Cheng, Tianheng Cheng, Xinqi Cheng, Xuyan Chi, Jian Cong, et al. Seedance 1.5 pro: A native audio-visual joint generation foundation model.arXiv preprint arXiv:2512.13507, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

Seedance 2.0: Advancing Video Generation for World Complexity

Team Seedance, De Chen, Liyang Chen, Xin Chen, Ying Chen, Zhuo Chen, Zhuowei Chen, Feng Cheng, Tianheng Cheng, Yufeng Cheng, Mojie Chi, Xuyan Chi, et al. Seedance 2.0: Advancing video generation for world complexity.arXiv preprint arXiv:2604.14148, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[13] [13]

Wan: Open and Advanced Large-Scale Video Generative Models

Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[14] [14]

HunyuanVideo 1.5 Technical Report

Bing Wu, Chang Zou, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Jack Peng, Jianbing Wu, Jiangfeng Xiong, Jie Jiang, Linus, Patrol, et al. Hunyuanvideo 1.5 technical report.arXiv preprint arXiv:2511.18870, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Cogvideox: Text-to-video diffusion models with an expert transformer

Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, et al. Cogvideox: Text-to-video diffusion models with an expert transformer. InICLR, 2025. 10

2025

[16] [16]

Sand ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, W. Q. Zhang, Weifeng Luo, Xiaoyang Kang, et al. Magi-1: Autoregressive video generation at scale.arXiv preprint arXiv:2505.13211, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

Causality in video diffusers is separable from denoising.arXiv preprint arXiv:2602.10095, 2026

Xingjian Bai, Guande He, Zhengqi Li, Eli Shechtman, Xun Huang, and Zongze Wu. Causality in video diffusers is separable from denoising.arXiv preprint arXiv:2602.10095, 2026

work page arXiv 2026

[18] [18]

SkyReels-V2: Infinite-length Film Generative Model

Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, Hao Zhang, Sheng Chen, Zheng Chen, Chengcheng Ma, Weiming Xiong, Wei Wang, et al. Skyreels-v2: Infinite-length film generative model.arXiv preprint arXiv:2504.13074, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[19] [19]

Autoregressive video generation without vector quantization

Haoge Deng, Ting Pan, Haiwen Diao, Zhengxiong Luo, Yufeng Cui, Huchuan Lu, Shiguang Shan, Yonggang Qi, and Xinlong Wang. Autoregressive video generation without vector quantization. InICLR, 2025

2025

[20] [20]

End-to-end training for autoregressive video diffusion via self-resampling.arXiv preprint arXiv:2512.15702, 2025

Yuwei Guo, Ceyuan Yang, Hao He, Yang Zhao, Meng Wei, Zhenheng Yang, Weilin Huang, and Dahua Lin. End-to-end training for autoregressive video diffusion via self-resampling.arXiv preprint arXiv:2512.15702, 2025

work page arXiv 2025

[21] [21]

Live: Long-horizon interactive video world modeling,

Junchao Huang, Ziyang Ye, Xinting Hu, Tianyu He, Guiyu Zhang, Shaoshuai Shi, Jiang Bian, and Li Jiang. Live: Long-horizon interactive video world modeling.arXiv preprint arXiv:2602.03747, 2026

work page arXiv 2026

[22] [22]

Pyramidal flow matching for efficient video generative modeling

Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Pyramidal flow matching for efficient video generative modeling. InICLR, 2025

2025

[23] [23]

Stable video infinity: Infinite- length video generation with error recycling

Wuyang Li, Wentao Pan, Po-Chien Luan, Yang Gao, and Alexandre Alahi. Stable video infinity: Infinite- length video generation with error recycling. InICLR, 2026

2026

[24] [24]

Infinitystar: Unified spacetime autoregressive modeling for visual generation

Jinlai Liu, Jian Han, Bin Yan, Hui Wu, Fengda Zhu, Xing Wang, Yi Jiang, Bingyue Peng, and Zehuan Yuan. Infinitystar: Unified spacetime autoregressive modeling for visual generation. InNeurIPS, 2025

2025

[25] [25]

Bagger: Backwards aggregation for mitigating drift in autoregressive video diffusion models.arXiv preprint arXiv:2512.12080, 2025

Ryan Po, Eric Ryan Chan, Changan Chen, and Gordon Wetzstein. Bagger: Backwards aggregation for mitigating drift in autoregressive video diffusion models.arXiv preprint arXiv:2512.12080, 2025

work page arXiv 2025

[26] [26]

Pack and force your memory: Long-form and consistent video generation.arXiv preprint arXiv:2510.01784, 2025

Xiaofei Wu, Guozhen Zhang, Zhiyong Xu, Yuan Zhou, Qinglin Lu, and Xuming He. Pack and force your memory: Long-form and consistent video generation.arXiv preprint arXiv:2510.01784, 2025

work page arXiv 2025

[27] [27]

Macro-from-micro planning for high-quality and parallelized autoregressive long video generation.arXiv preprint arXiv:2508.03334, 2025

Xunzhi Xiang, Yabo Chen, Guiyu Zhang, Zhongyu Wang, Zhe Gao, Quanming Xiang, Gonghu Shang, Junqi Liu, Haibin Huang, Yang Gao, Chi Zhang, Qi Fan, et al. Macro-from-micro planning for high-quality and parallelized autoregressive long video generation.arXiv preprint arXiv:2508.03334, 2025

work page arXiv 2025

[28] [28]

Helios: Real real-time long video generation model.arXiv preprint arXiv:2603.04379, 2026

Shenghai Yuan, Yuanyang Yin, Zongjian Li, Xinwei Huang, Xiao Yang, and Li Yuan. Helios: Real real-time long video generation model.arXiv preprint arXiv:2603.04379, 2026

work page arXiv 2026

[29] [29]

BIFE: Better Interaction, Fewer Errors for Minute-Long Video Generation

Zeyu Zhang, Shuning Chang, Yuanyu He, Yizeng Han, Jiasheng Tang, Fan Wang, and Bohan Zhuang. Blockvid: Block diffusion for high-quality and consistent minute-long video generation.arXiv preprint arXiv:2511.22973, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[30] [30]

TinyHistory: Lightweight Video History Embeddings via Two-Stage Context Learning

Lvmin Zhang, Shengqu Cai, Muyang Li, Chong Zeng, Beijia Lu, Anyi Rao, Song Han, Gordon Wetzstein, and Maneesh Agrawala. Pretraining frame preservation for lightweight autoregressive video history embedding.arXiv preprint arXiv:2512.23851, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[31] [31]

Self forcing: Bridging the train-test gap in autoregressive video diffusion

Xun Huang, Zhengqi Li, Guande He, Mingyuan Zhou, and Eli Shechtman. Self forcing: Bridging the train-test gap in autoregressive video diffusion. InNeurIPS, 2025

2025

[32] [32]

Rolling forcing: Autoregressive long video diffusion in real time

Kunhao Liu, Wenbo Hu, Jiale Xu, Ying Shan, and Shijian Lu. Rolling forcing: Autoregressive long video diffusion in real time. InICLR, 2026

2026

[33] [33]

Reward forcing: Efficient streaming video generation with rewarded distribution matching distillation

Yunhong Lu, Yanhong Zeng, Haobo Li, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jiapeng Zhu, Hengyuan Cao, Zhipeng Zhang, Xing Zhu, Yujun Shen, and Min Zhang. Reward forcing: Efficient streaming video generation with rewarded distribution matching distillation. InCVPR, 2026

2026

[34] [34]

Longlive: Real-time interactive long video generation

Shuai Yang, Wei Huang, Ruihang Chu, Yicheng Xiao, Yuyang Zhao, Xianbang Wang, Muyang Li, Enze Xie, Yingcong Chen, Yao Lu, Song Han, and Yukang Chen. Longlive: Real-time interactive long video generation. InICLR, 2026

2026

[35] [35]

Freeman, Fredo Durand, Eli Shechtman, and Xun Huang

Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, Eli Shechtman, and Xun Huang. From slow bidirectional to fast autoregressive video diffusion models. InCVPR, 2025. 11

2025

[36] [36]

Causal forcing: Autore- gressive diffusion distillation done right for high-quality real-time interactive video generation

Hongzhou Zhu, Min Zhao, Guande He, Hang Su, Chongxuan Li, and Jun Zhu. Causal forcing: Autore- gressive diffusion distillation done right for high-quality real-time interactive video generation. InICML, 2026

2026

[37] [37]

Diffusion forcing: Next-token prediction meets full-sequence diffusion

Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, and Vincent Sitzmann. Diffusion forcing: Next-token prediction meets full-sequence diffusion. InNeurIPS, 2024

2024

[38] [38]

History- guided video diffusion

Kiwhan Song, Boyuan Chen, Max Simchowitz, Yilun Du, Russ Tedrake, and Vincent Sitzmann. History- guided video diffusion. InICML, 2025

2025

[39] [39]

Freeman, and Taesung Park

Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InCVPR, pages 6613–6623, 2024

2024

[40] [40]

Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T. Freeman. Improved distribution matching distillation for fast image synthesis. InNeurIPS, 2024

2024

[41] [41]

Flow-grpo: Training flow matching models via online rl

Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl. InNeurIPS, 2025

2025

[42] [42]

Context forcing: Consistent autoregressive video generation with long context.arXiv preprint arXiv:2602.06028, 2026

Shuo Chen, Cong Wei, Sun Sun, Ping Nie, Kai Zhou, Ge Zhang, Ming-Hsuan Yang, and Wenhu Chen. Context forcing: Consistent autoregressive video generation with long context.arXiv preprint arXiv:2602.06028, 2026

work page arXiv 2026

[43] [43]

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, and Cho-Jui Hsieh. Self-forcing++: Towards minute-scale high-quality video generation.arXiv preprint arXiv:2510.02283, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[44] [44]

Streaming autoregressive video generation via diagonal distillation

Jinxiu Liu, Xuanming Liu, Kangfu Mei, Yandong Wen, Ming-Hsuan Yang, and Weiyang Liu. Streaming autoregressive video generation via diagonal distillation. InICLR, 2026

2026

[45] [45]

Hiar: Efficient autoregressive long video generation via hierarchical denoising.arXiv preprint arXiv:2603.08703, 2026

Kai Zou, Dian Zheng, Hongbo Liu, Tiankai Hang, Bin Liu, and Nenghai Yu. Hiar: Efficient autoregressive long video generation via hierarchical denoising.arXiv preprint arXiv:2603.08703, 2026

work page arXiv 2026

[46] [46]

Past- and future-informed kv cache policy with salience estimation in autoregressive video diffusion.arXiv preprint arXiv:2601.21896, 2026

Hanmo Chen, Chenghao Xu, Xu Yang, Xuan Chen, and Cheng Deng. Past- and future-informed kv cache policy with salience estimation in autoregressive video diffusion.arXiv preprint arXiv:2601.21896, 2026

work page arXiv 2026

[47] [47]

Memflow: Flowing adaptive memory for consistent and efficient long video narratives.arXiv preprint arXiv:2512.14699, 2025

Sihui Ji, Xi Chen, Shuai Yang, Xin Tao, Pengfei Wan, and Hengshuang Zhao. Memflow: Flowing adaptive memory for consistent and efficient long video narratives.arXiv preprint arXiv:2512.14699, 2025

work page arXiv 2025

[48] [48]

Motionstream: Real-time video generation with interactive motion controls

Joonghyuk Shin, Zhengqi Li, Richard Zhang, Jun-Yan Zhu, Jaesik Park, Eli Shechtman, and Xun Huang. Motionstream: Real-time video generation with interactive motion controls. InICLR, 2026

2026

[49] [49]

Videossm: Autoregressive long video gen- eration with hybrid state-space memory.arXiv preprint arXiv:2512.04519, 2025

Yifei Yu, Xiaoshan Wu, Xinting Hu, Tao Hu, Yangtian Sun, Xiaoyang Lyu, Bo Wang, Lin Ma, Yuewen Ma, Zhongrui Wang, and Xiaojuan Qi. Videossm: Autoregressive long video generation with hybrid state-space memory.arXiv preprint arXiv:2512.04519, 2025

work page arXiv 2025

[50] [50]

Memorize-and-generate: Towards long-term consistency in real-time video generation.arXiv preprint arXiv:2512.18741, 2025

Tianrui Zhu, Shiyi Zhang, Zhirui Sun, Jingqi Tian, and Yansong Tang. Memorize-and-generate: Towards long-term consistency in real-time video generation.arXiv preprint arXiv:2512.18741, 2025

work page arXiv 2025

[51] [51]

Lol: Longer than longer, scaling video generation to hour.arXiv preprint arXiv:2601.16914, 2026

Justin Cui, Jie Wu, Ming Li, Tao Yang, Xiaojie Li, Rui Wang, Andrew Bai, Yuanhao Ban, and Cho-Jui Hsieh. Lol: Longer than longer, scaling video generation to hour.arXiv preprint arXiv:2601.16914, 2026

work page arXiv 2026

[52] [52]

Train short, inference long: Training-free horizon extension for autoregressive video generation.arXiv preprint arXiv:2602.14027, 2026

Jia Li, Xiaomeng Fu, Xurui Peng, Weifeng Chen, Youwei Zheng, Tianyu Zhao, Jiexi Wang, Fangmin Chen, Xing Wang, and Hayden Kwok-Hay So. Train short, inference long: Training-free horizon extension for autoregressive video generation.arXiv preprint arXiv:2602.14027, 2026

work page arXiv 2026

[53] [53]

Pathwise test-time correction for autoregressive long video generation.arXiv preprint arXiv:2602.05871, 2026

Xunzhi Xiang, Zixuan Duan, Guiyu Zhang, Haiyu Zhang, Zhe Gao, Junta Wu, Shaofeng Zhang, Tengfei Wang, Qi Fan, and Chunchao Guo. Pathwise test-time correction for autoregressive long video generation. arXiv preprint arXiv:2602.05871, 2026

work page arXiv 2026

[54] [54]

Infinity-rope: Action-controllable infinite video generation emerges from autoregressive self-rollout

Hidir Yesiltepe, Tuna Han Salih Meral, Adil Kaan Akan, Kaan Oktay, and Pinar Yanardag. Infinity-rope: Action-controllable infinite video generation emerges from autoregressive self-rollout. InCVPR, 2026

2026

[55] [55]

arXiv preprint arXiv:2512.05081 (2025)

Jung Yi, Wooseok Jang, Paul Hyunbin Cho, Jisu Nam, Heeji Yoon, and Seungryong Kim. Deep forcing: Training-free long video generation with deep sink and participative compression.arXiv preprint arXiv:2512.05081, 2025. 12

work page arXiv 2025

[56] [56]

Relax forcing: Relaxed kv-memory for consistent long video generation.arXiv preprint arXiv:2603.21366, 2026

Zengqun Zhao, Yanzuo Lu, Ziquan Liu, Jifei Song, Jiankang Deng, and Ioannis Patras. Relax forcing: Relaxed kv-memory for consistent long video generation.arXiv preprint arXiv:2603.21366, 2026

work page arXiv 2026

[57] [57]

Training diffusion models with reinforcement learning

Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning. InICLR, 2024

2024

[58] [58]

Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang. Pref-grpo: Pairwise preference reward-based grpo for stable text-to-image reinforcement learning.arXiv preprint arXiv:2508.20751, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[59] [59]

Imagereward: Learning and evaluating human preferences for text-to-image generation

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. InNeurIPS, 2023

2023

[60] [60]

Diffusionnft: Online diffusion reinforcement with forward process

Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process. InICLR, 2026

2026

[61] [61]

Stage: Stable and generalizable grpo for autoregressive image generation.arXiv preprint arXiv:2509.25027, 2025

Xiaoxiao Ma, Haibo Qiu, Guohui Zhang, Zhixiong Zeng, Siqi Yang, Lin Ma, and Feng Zhao. Stage: Stable and generalizable grpo for autoregressive image generation.arXiv preprint arXiv:2509.25027, 2025

work page arXiv 2025

[62] [62]

Worldcompass: Reinforce- ment learning for long-horizon world models,

Zehan Wang, Tengfei Wang, Haiyu Zhang, Xuhui Zuo, Junta Wu, Haoyuan Wang, Wenqiang Sun, Zhenwei Wang, Chenjie Cao, Hengshuang Zhao, Chunchao Guo, and Zhou Zhao. Worldcompass: Reinforcement learning for long-horizon world models.arXiv preprint arXiv:2602.09022, 2026

work page arXiv 2026

[63] [63]

Rlvr-world: Training world models with reinforcement learning

Jialong Wu, Shaofeng Yin, Ningya Feng, and Mingsheng Long. Rlvr-world: Training world models with reinforcement learning. InICLR, 2025

2025

[64] [64]

Reinforcement learning with inverse rewards for world model post-training.arXiv preprint arXiv:2509.23958, 2025

Yang Ye, Tianyu He, Shuo Yang, and Jiang Bian. Reinforcement learning with inverse rewards for world model post-training.arXiv preprint arXiv:2509.23958, 2025

work page arXiv 2025

[65] [65]

Ma, Haoyang Huang, Nan Duan, and Anyi Rao

Songchun Zhang, Zeyue Xue, Siming Fu, Jie Huang, Xianghao Kong, Y . Ma, Haoyang Huang, Nan Duan, and Anyi Rao. Astrolabe: Steering forward-process reinforcement learning for distilled autoregressive video models.arXiv preprint arXiv:2603.17051, 2026

work page arXiv 2026

[66] [66]

Real-time motion-controllable autoregressive video diffusion

Kesen Zhao, Jiaxin Shi, Beier Zhu, Junbao Zhou, Xiaolong Shen, Yuan Zhou, Qianru Sun, and Hanwang Zhang. Real-time motion-controllable autoregressive video diffusion. InICLR, 2026

2026

[67] [67]

Flash-dmd: Towards high-fidelity few-step image generation with efficient distillation and joint reinforcement learning.arXiv preprint arXiv:2511.20549, 2025

Guanjie Chen, Shirui Huang, Kai Liu, Jianchen Zhu, Xiaoye Qu, Peng Chen, Yu Cheng, and Yifu Sun. Flash-dmd: Towards high-fidelity few-step image generation with efficient distillation and joint reinforcement learning.arXiv preprint arXiv:2511.20549, 2025

work page arXiv 2025

[68] [68]

Erudiff: Refactoring knowledge in diffusion models for advanced text-to-image synthesis.arXiv preprint arXiv:2603.20828, 2026

Xiefan Guo, Xinzhu Ma, Haoxiang Ma, Zihao Zhou, and Di Huang. Erudiff: Refactoring knowledge in diffusion models for advanced text-to-image synthesis.arXiv preprint arXiv:2603.20828, 2026

work page arXiv 2026

[69] [69]

Tdm-r1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700, 2026

Yihong Luo, Tianyang Hu, Weijian Luo, and Jing Tang. Tdm-r1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700, 2026

work page arXiv 2026

[70] [70]

Gardo: Reinforcing diffusion models without reward hacking.arXiv preprint arXiv:2512.24138, 2025

Haoran He, Yuxiao Ye, Jie Liu, Jiajun Liang, Zhiyong Wang, Ziyang Yuan, Xintao Wang, Hangyu Mao, Pengfei Wan, and Ling Pan. Gardo: Reinforcing diffusion models without reward hacking.arXiv preprint arXiv:2512.24138, 2025

work page arXiv 2025

[71] [71]

Unigrpo: Unified policy optimization for reasoning-driven visual generation.arXiv preprint arXiv:2603.23500, 2026

Jie Liu, Zilyu Ye, Linxiao Yuan, Shenhan Zhu, Yu Gao, Jie Wu, Kunchang Li, Xionghui Wang, Xiaonan Nie, Weilin Huang, and Wanli Ouyang. Unigrpo: Unified policy optimization for reasoning-driven visual generation.arXiv preprint arXiv:2603.23500, 2026

work page arXiv 2026

[72] [72]

Data-regularized reinforcement learning for diffusion models at scale.arXiv preprint arXiv:2512.04332, 2025

Haotian Ye, Kaiwen Zheng, Jiashu Xu, Puheng Li, Huayu Chen, Jiaqi Han, Sheng Liu, Qinsheng Zhang, Hanzi Mao, Zekun Hao, Prithvijit Chattopadhyay, Dinghao Yang, et al. Data-regularized reinforcement learning for diffusion models at scale.arXiv preprint arXiv:2512.04332, 2025

work page arXiv 2025

[73] [73]

Diffusion reinforcement learning via centered reward distillation.arXiv preprint arXiv:2603.14128, 2026

Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, and Vicky Kalogeiton. Diffusion reinforcement learning via centered reward distillation.arXiv preprint arXiv:2603.14128, 2026

work page arXiv 2026

[74] [74]

Neighbor grpo: Contrastive ode policy optimization aligns flow models

Dailan He, Guanlin Feng, Xingtong Ge, Yazhe Niu, Yi Zhang, Bingqi Ma, Guanglu Song, Yu Liu, and Hongsheng Li. Neighbor grpo: Contrastive ode policy optimization aligns flow models. InCVPR, 2026

2026

[75] [75]

Reinforcing diffusion models by direct group preference optimization

Yihong Luo, Tianyang Hu, and Jing Tang. Reinforcing diffusion models by direct group preference optimization. InICLR, 2026. 13

2026

[76] [76]

Yao, and Wenpin Tang

Jiayuan Sheng, Hanyang Zhao, Haoxian Chen, David D. Yao, and Wenpin Tang. Understanding sampler stochasticity in training diffusion models for rlhf.arXiv preprint arXiv:2510.10767, 2025

work page arXiv 2025

[77] [77]

Yibin Wang, Zhimin Li, Yuhang Zang, Yujie Zhou, Jiazi Bu, Chunyu Wang, Qinglin Lu, Cheng Jin, and Jiaqi Wang

Feng Wang and Zihao Yu. Coefficients-preserving sampling for reinforcement learning with flow matching. arXiv preprint arXiv:2509.05952, 2025

work page arXiv 2025

[78] [78]

Pc-flow: Preference alignment in flow matching via classifier

Shaomeng Wang, He Wang, Longquan Dai, and Jinhui Tang. Pc-flow: Preference alignment in flow matching via classifier. InAAAI, 2026

2026

[79] [79]

E-grpo: High entropy steps drive effective reinforcement learning for flow models

Shengjun Zhang, Zhang Zhang, Chensheng Dai, and Yueqi Duan. E-grpo: High entropy steps drive effective reinforcement learning for flow models. InCVPR Findings, 2026

2026

[80] [80]

Manifold-aware exploration for reinforcement learning in video generation.arXiv preprint arXiv:2603.21872, 2026

Mingzhe Zheng, Weijie Kong, Yue Wu, Dengyang Jiang, Yue Ma, Xuanhua He, Bin Lin, Kaixiong Gong, Zhao Zhong, Liefeng Bo, Qifeng Chen, and Harry Yang. Manifold-aware exploration for reinforcement learning in video generation.arXiv preprint arXiv:2603.21872, 2026

work page arXiv 2026