{"total":11,"items":[{"citing_arxiv_id":"2605.23345","ref_index":62,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models","primary_cat":"cs.CV","submitted_at":"2026-05-22T08:06:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21484","ref_index":56,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration","primary_cat":"cs.CV","submitted_at":"2026-05-20T17:59:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Fixed-Point Distillation constructs one-step correction targets for discrete diffusion generators via partial corruption and single teacher refinement, lifted into continuous features with a multi-bandwidth drift loss and straight-through estimation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20708","ref_index":65,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Rethinking Cross-Layer Information Routing in Diffusion Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-20T05:07:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DAR replaces residual addition in DiTs with learnable, timestep-adaptive aggregation of sublayer outputs, yielding 2.11 FID improvement on SiT-XL/2 and 8.75x faster convergence on ImageNet 256x256.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17019","ref_index":72,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"StreamingEffect: Real-Time Human-Centric Video Effect Generation","primary_cat":"cs.CV","submitted_at":"2026-05-16T14:45:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"StreamingEffect enables real-time 720p human-centric video effect generation on one GPU via teacher-student distillation, keyframe control, and a new 130K video dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14513","ref_index":41,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"HASTE: Training-Free Video Diffusion Acceleration via Head-Wise Adaptive Sparse Attention","primary_cat":"cs.CV","submitted_at":"2026-05-14T07:57:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HASTE delivers up to 1.93x speedup on Wan2.1 video DiTs via head-wise adaptive sparse attention using temporal mask reuse and error-guided per-head calibration while preserving video quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12496","ref_index":54,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"CausalCine: Real-Time Autoregressive Generation for Multi-Shot Video Narratives","primary_cat":"cs.CV","submitted_at":"2026-05-12T17:59:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CausalCine enables real-time causal autoregressive multi-shot video generation via multi-shot training, content-aware memory routing for coherence, and distillation to few-step inference.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"CAMR retrieves useful long-range context and maintains a streamlined memory representation, improving cross-shot coherence without sacrificing causal generation. Finally, we distill the causal multi-shot base model into a few-step generator for real-time interactive synthesis. Because causality and multi-shot structure have already been learned by the full-step model, Distribution Matching Distillation (DMD) [54, 53] can focus on trajectory compression while preserving visual quality and cross-shot consistency. The resulting model generates videos chunk by chunk with KV caching, supports prompt updates during generation, and continues a sequence without recomputing previous shots. The resulting system enables real-time online directing for long-form video generation."},{"citing_arxiv_id":"2605.03849","ref_index":38,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation","primary_cat":"cs.CV","submitted_at":"2026-05-05T15:15:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Stream-R1 improves distillation of autoregressive streaming video diffusion models by adaptively weighting supervision with a reward model at both rollout and per-pixel levels.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08995","ref_index":48,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory","primary_cat":"cs.CV","submitted_at":"2026-04-10T06:00:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Matrix-Game 3.0 delivers 720p real-time video generation at 40 FPS with minute-scale memory consistency by combining residual self-correction training, camera-aware memory injection, and DMD-based autoregressive distillation on a 5B model.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"training [24] to learn self-correction for the base model, better aligning with multi-segment generation process. Deployment.Industrial system suggests that real-time high-resolution interaction is achievable [2], but training recipes and full inference pipelines are often undisclosed. To this end,we introduce a multi-segment distillation method for bidirectional models, inspired by Distribution Matching Distillation (DMD) [48] and Self-Forcing [19] paradigms, reducing error accumulation yet achieving streaming inference.We further deploy a series of acceleration techniques to achieve 40FPS generaiton at 720p resolution for 5B parameter model.(e.g. DiT quantization, V AE pruning, retrieval via GPU, etc.) 3 2 Related Works 2.1 Video Generation Models Recent video generation models have largely converged toward Diffusion Transformer (DiT)-based"},{"citing_arxiv_id":"2511.22699","ref_index":89,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer","primary_cat":"cs.CV","submitted_at":"2025-11-27T18:52:07+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"[87] Yang Ye, Xianyi He, Zongjian Li, Bin Lin, Shenghai Yuan, Zhiyuan Yan, Bohan Hou, and Li Yuan. Imgedit: A unified image editing dataset and benchmark.arXiv preprint arXiv:2505.20275, 2025. [88] Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman. Improved distribution matching distillation for fast image synthesis. In NeurIPS, 2024. [89] Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6613-6623, 2024. [90] Qifan Yu, Wei Chow, Zhongqi Yue, Kaihang Pan, Yang Wu, Xiaoyang Wan, Juncheng Li, Siliang"},{"citing_arxiv_id":"2510.02283","ref_index":69,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Self-Forcing++: Towards Minute-Scale High-Quality Video Generation","primary_cat":"cs.CV","submitted_at":"2025-10-02T17:55:42+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Self-Forcing++ scales autoregressive video diffusion to over 4 minutes by using self-generated segments for guidance, reducing error accumulation and outperforming baselines in fidelity and consistency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.19519","ref_index":42,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Preserve and Personalize: Personalized Text-to-Image Diffusion Models without Distributional Drift","primary_cat":"cs.CV","submitted_at":"2025-05-26T05:03:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Proposes Lipschitz regularization during fine-tuning to prevent distributional drift in personalized diffusion models, improving subject fidelity and prompt adherence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}