{"total":14,"items":[{"citing_arxiv_id":"2606.27696","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Class-frequency Guided Noise Schedule for Diffusion Models","primary_cat":"cs.LG","submitted_at":"2026-06-26T03:43:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes CFRG noise schedule for diffusion models that assigns larger noises to low-frequency classes to improve generation on imbalanced datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.26032","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution","primary_cat":"cs.CV","submitted_at":"2026-05-25T17:01:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SKILD unifies unconditional image generation and continuous super-resolution in one diffusion model via scale-invariant k-space dynamics where the reverse process handles both tasks by varying only the starting timestep.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19050","ref_index":72,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative Pseudo-Force Fields for Molecular Generation","primary_cat":"cs.LG","submitted_at":"2026-05-18T19:14:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Proposes generative pseudo-force fields trained on quadratic pseudo-potentials from noisy equilibria as a time-step-agnostic diffusion variant for efficient molecular conformation generation with high validity on QM9.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18749","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"WavFlow: Audio Generation in Waveform Space","primary_cat":"cs.SD","submitted_at":"2026-05-18T17:59:10+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"WavFlow performs direct waveform audio generation via flow matching on 2D token grids from raw patches plus amplitude lifting, matching latent-based methods on VGGSound and AudioCaps without intermediate compression.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16126","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Entropy Across the Bridge: Conditional-Marginal Discretization for Flow and Schr\\\"odinger Samplers","primary_cat":"cs.LG","submitted_at":"2026-05-15T16:11:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Derives a conditional-marginal entropy-rate objective for bridge-aware discretization that yields U-shaped schedules and improves low-NFE sample quality on 2D, CIFAR-10, and protein tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11773","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Is Monotonic Sampling Necessary in Diffusion Models?","primary_cat":"cs.LG","submitted_at":"2026-05-12T08:45:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Non-monotonic sampling schedules never improve upon monotonic baselines in diffusion models, with performance gaps ranging from substantial to negligible depending on the denoiser.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Architecture work - Karras et al. (2024b) (EDM2), Peebles & Xie (2023) (DiT), Rombach et al. (2022) (Latent Diffusion) - inherits the scheduling assumption. A large body of work then optimises theshape of the monotonic schedule: the cosine schedule (Nichol & Dhariwal, 2021), the learnable VDM (Kingma et al., 2021), resolution-dependent logSNR shifts (Chen, 2023; Hoogeboom et al., 2023), Min-SNR (Hang et al., 2023) and importance sampling around log SNR = 0 (Hang et al., 2024), terminal-SNR fixes (Lin et al., 2024), solver-aware step placement (Sabour et al., 2024), weighted-ELBO ob- jectives (Kingma & Gao, 2023), and SNR refinements in HDiT (Crowson et al., 2024).Every entry above optimises within the space of monotonic schedules."},{"citing_arxiv_id":"2511.13720","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Back to Basics: Let Denoising Generative Models Denoise","primary_cat":"cs.CV","submitted_at":"2025-11-17T18:59:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Directly predicting clean data with large-patch pixel Transformers enables strong generative performance in diffusion models where noise prediction fails at high dimensions.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"to pixel or other high-dimensional spaces, existing diffusion models can still struggle to address thecurse of dimension- ality[4]. The heavy reliance on a pre-trained latent space prevents diffusion models from being self-contained. In pursuit of a self-contained principle, there has been strong focus on advancing diffusion modeling in the pixel space [7, 25, 26, 6, 70]. In general, these methods explicitly or implicitly avoid the information bottleneck in the net- works,e.g., by using dense convolutions or smaller patches, increasing channels, or adding long skip connections. We suggest that these designs may stem from the demand to predict high-dimensional noised quantities. In this paper, we return to first principles and let the neu-"},{"citing_arxiv_id":"2511.00062","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Simulation with Video Foundation Models for Physical AI","primary_cat":"cs.CV","submitted_at":"2025-10-28T22:44:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Cosmos-Predict2.5 unifies text-to-world, image-to-world, and video-to-world generation in one model trained on 200M clips with RL post-training, delivering improved quality and control for physical AI.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Planning with reasoning using vision language world model.arXiv preprint arXiv:2509.02722, 2025. 35 [14] Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, and Bingyi Kang. Video depth anything: Consistent depth estimation for super-long videos. InCVPR, 2025. 19 38 World Simulation with Video Foundation Models for Physical AI [15] Ting Chen. On the importance of noise scheduling for diffusion models.arXiv preprint arXiv:2301.10972, 2023. 8 [16] Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin CM Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023. 22 [17] Databricks. Delta lake: Open-source storage framework that enables building lakehouses."},{"citing_arxiv_id":"2510.02307","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"NoiseShift: Resolution-Aware Noise Recalibration for Better Low-Resolution Image Generation","primary_cat":"cs.CV","submitted_at":"2025-10-02T17:59:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"NoiseShift learns a resolution-specific mapping from scheduler noise to conditioning noise via lightweight calibration to restore consistency and improve low-resolution generation quality in models like SD3 and Flux.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.16827","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Blur: A Fluid Perspective on Generative Diffusion Models","primary_cat":"cs.GR","submitted_at":"2025-06-20T08:31:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Proposes an advection-diffusion PDE corruption process with stochastic velocity fields and Lattice Boltzmann solver for diffusion models, generalizing prior PDE methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2504.07940","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos","primary_cat":"cs.CV","submitted_at":"2025-04-10T17:51:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A generative model produces realistic and coherent 360 panoramic videos from in-the-wild perspective videos via curated online data and geometry-motion aware operations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.06764","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"History-Guided Video Diffusion","primary_cat":"cs.LG","submitted_at":"2025-02-10T18:44:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DFoT enables flexible history conditioning in video diffusion, with history guidance methods that boost temporal consistency and support long rollouts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2501.03575","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cosmos World Foundation Model Platform for Physical AI","primary_cat":"cs.CV","submitted_at":"2025-01-07T06:55:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation.arXiv preprint arXiv:2410.06158, 2024. 57 [22] Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016. 23 [23] Ting Chen. On the importance of noise scheduling for diffusion models.arXiv preprint arXiv:2301.10972, 2023. 22 [24] Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, et al. Panda-70m: Captioning 70m videos with multiple cross-modality teachers. InCVPR, 2024. 7, 16 [25] Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song."},{"citing_arxiv_id":"2310.15110","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model","primary_cat":"cs.CV","submitted_at":"2023-10-23T17:18:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Zero123++ produces high-quality 3D-consistent multi-view images from a single input by fine-tuning Stable Diffusion with targeted conditioning and training methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}