{"total":14,"items":[{"citing_arxiv_id":"2605.20624","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Accelerating Video Inverse Problem Solvers with Autoregressive Diffusion Models","primary_cat":"cs.CV","submitted_at":"2026-05-20T02:16:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AVIS applies autoregressive diffusion models to video inverse problems by streaming restoration with measurement-consistent initialization, reducing latency from 114s to 4s and raising throughput to 1.18 FPS (or 5.91 FPS in the Flash variant).","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16949","ref_index":5,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Beyond Point-Wise Matching: Structural Representation Alignment for Accelerating Diffusion Transformers","primary_cat":"cs.CV","submitted_at":"2026-05-16T12:01:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"sREPA enforces structural consistency in relational geometry of pre-trained vision features to accelerate DiT training and improve generation quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15980","ref_index":2,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization","primary_cat":"cs.CV","submitted_at":"2026-05-15T14:13:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Flash-GRPO is a one-step GRPO framework for video diffusion alignment that applies iso-temporal grouping and temporal gradient rectification to achieve higher alignment quality and stability than full-trajectory training under low compute budgets on 1.3B-14B models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14486","ref_index":13,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Reduce the Artifacts Bias for More Generalizable AI-Generated Image Detection","primary_cat":"cs.CV","submitted_at":"2026-05-14T07:26:36+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SEF introduces GAN upsampling for diverse artifacts and expert fusion to reduce domain interference, yielding stronger generalization on 13 benchmarks for AI-generated image detection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13010","ref_index":6,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Amortized Guidance for Image Inpainting with Pretrained Diffusion Models","primary_cat":"cs.CV","submitted_at":"2026-05-13T05:02:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AID amortizes guidance for diffusion inpainting by training a reusable module via an auxiliary Gaussian formulation and continuous-time actor-critic algorithm, improving quality-speed trade-off with under 1% overhead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10971","ref_index":37,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-08T18:52:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"trastive Vectors(mean class difference [15, 16]),Probe(logistic regression weight [ 17]), andPCA (top principal component of contrastive activations). For LLaDA and DREAM, we additionally evaluatePrompt Steering, prepending an attribute instruction to the prompt (§C.8). We note that other baselines such as ILRR [ 14] require a reference sequence (a different problem setup), and classifier-based guidance methods [37] require differentiable classifiers integrated into the denoising loop and are not directly applicable to masked discrete diffusion. Metrics:Steering performance is measured by off-the-shelf classifier confidence: DistilBERT- SST2 [38] for sentiment, BERT-AG News [39] for topic, and a RoBERTa formality ranker [33] for 8 0 25 50 75 100Conf (%) Sentiment (S)"},{"citing_arxiv_id":"2605.06421","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"FREPix: Frequency-Heterogeneous Flow Matching for Pixel-Space Image Generation","primary_cat":"cs.CV","submitted_at":"2026-05-07T15:27:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FREPix achieves competitive FID scores on ImageNet by decomposing image generation into separate low- and high-frequency paths within a flow matching framework.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06104","ref_index":39,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer","primary_cat":"cs.LG","submitted_at":"2026-05-07T12:20:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04358","ref_index":12,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Intermediate Representations are Strong AI-Generated Image Detectors","primary_cat":"cs.CV","submitted_at":"2026-05-05T23:26:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Intermediate layer embedding sensitivity to perturbations distinguishes AI-generated images from real ones, yielding higher AUROC on GenImage and Forensics Small benchmarks than prior methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00503","ref_index":8,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"End-to-End Autoregressive Image Generation with 1D Semantic Tokenizer","primary_cat":"cs.CV","submitted_at":"2026-05-01T08:25:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An end-to-end autoregressive model with a jointly trained 1D semantic tokenizer achieves state-of-the-art FID 1.48 on ImageNet 256x256 generation without guidance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22379","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Efficient Diffusion Distillation via Embedding Loss","primary_cat":"cs.CV","submitted_at":"2026-04-24T09:16:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Embedding Loss aligns feature distributions via MMD in random network embeddings to boost one-step diffusion distillation, reaching SOTA FID of 1.475 on CIFAR-10 unconditional generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08960","ref_index":39,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-04-10T05:04:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Proposes mean flow policies and LeJEPA loss to overcome Gaussian policy limits and weak subgoal generation in hierarchical offline GCRL, reporting strong results on OGBench state and pixel tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"need: Skills and exploration emerge from contrastive rl without rewards, demon- strations, or subgoals.arXiv preprint arXiv:2408.05804, 2024. [38] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InInterna- tional conference on machine learning, pages 2256-2265. pmlr, 2015. [39] Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780-8794, 2021. [40] Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas M ¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scal- ing rectified flow transformers for high-resolution image synthesis."},{"citing_arxiv_id":"2604.08313","ref_index":1,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Weakly-Supervised Lung Nodule Segmentation via Training-Free Guidance of 3D Rectified Flow","primary_cat":"cs.CV","submitted_at":"2026-04-09T14:46:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Training-free guidance of a pretrained 3D rectified flow model enables weakly-supervised lung nodule segmentation using only image-level labels and produces improved results on the LUNA16 dataset.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04335","ref_index":9,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads","primary_cat":"cs.DC","submitted_at":"2026-04-06T01:02:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GENSERVE improves SLO attainment by up to 44% for co-serving heterogeneous T2I and T2V diffusion workloads via step-level preemption, elastic parallelism, and joint scheduling.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"sequence parallelism with dynamic batching, and an SLO- aware scheduler that jointly optimizes resource allocation across all concurrent requests. Experimental results show that GENSERVEimproves the SLO attainment rate by up to 44% over the strongest baseline across diverse configurations. 1 Introduction Recently, diffusion models have achieved remarkable break- throughs and widespread popularity [9, 13, 30], demonstrat- ing unparalleled performance in generative tasks. From U-Net architectures [13, 30, 33] to Diffusion Transform- ers (DiT) [5, 10, 26], these models have established them- selves as the dominant paradigm for both text-to-image (T2I) [27, 29, 31] and text-to-video (T2V) [6, 32] genera- tion. As demand grows, production platforms need to serve"}],"limit":50,"offset":0}