{"total":13,"items":[{"citing_arxiv_id":"2605.22050","ref_index":23,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations","primary_cat":"cs.CV","submitted_at":"2026-05-21T06:36:59+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20237","ref_index":45,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AnimeAdapter: A Modular Adapter for Appearance-Consistent Anime Character Generation","primary_cat":"cs.CV","submitted_at":"2026-05-17T07:40:20+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10019","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"The two clocks and the innovation window: When and how generative models learn rules","primary_cat":"cs.LG","submitted_at":"2026-05-11T05:44:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Generative models learn rules before memorizing data, creating an innovation window whose width depends on dataset size and rule complexity, observed in both diffusion and autoregressive architectures.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03623","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Few-Step Generative Model on Cumulative Flow Maps","primary_cat":"cs.LG","submitted_at":"2026-05-05T10:51:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Cumulative flow maps unify few-step generative modeling for diffusion and flow models via cumulative transport and parameterization with minimal changes to time embeddings and objectives.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00329","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Fast Text-to-Audio Generation with One-Step Sampling via Energy-Scoring and Auxiliary Contextual Representation Distillation","primary_cat":"cs.SD","submitted_at":"2026-05-01T01:13:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A one-step text-to-audio model using energy-distance training and contextual distillation outperforms prior fast baselines on AudioCaps and achieves up to 8.5x faster inference than the multi-step IMPACT system with competitive quality.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"AudioTurbo≈2000 1.1B 5 22.18 - 1.30 8.88 - - - - AudioTurbo≈2000 1.1B 10 20.65 - 1.29 9.40 - - - - AUDIODEAR w/o Dist. 1700 191M 1 22.09 3.82 1.22 8.07 0.298 - - 2.61 AUDIODEAR 1700 191M 1 18.672.791.069.66 0.334 4.27±0.043.27±0.062.61 energy-scoring head are provided in Appendix E. During training, we apply a masking rate randomly sampled from the range [70,100) to the audio latents, enabling masked generative modeling with the energy-distance objective. For representation distillation, we adopt the transformer back- bone of the diffusion-based state-of-the-art model IMPACT (Huang et al., 2025) as the teacher, and integrate the distil- lation loss with the energy-distance objective using a distil- lation weight λ= 1000 , as defined in Equation 5. Unless otherwise specified, we train with a batch size of 2048 and a learning rate of 1e−3. At inference time, we follow IM- PACT by setting the number of decoding iterations to 64. Following related work (Ma et al., 2025), we apply classifier- free guidance during inference, with CFG scale set to 4.0. Ablation studies and implementation details on CFG can be found in Appendix F. 4.3. Evaluation We evaluate our proposed TTA generation framework us- ing both objective and subjective metrics. For objective assessment, we report Fr 'echet distance (FD; Heusel et al. 2017), Fr'echet audio distance (FAD; Kilgour et al. 2018), Kullback-Leibler divergence (KL), and inception score (IS; Salimans et al. 2016) following the AudioLDM evaluation protocol 2, and CLAP similarity (Wu et al., 2023) using the same pre-trained CLAP model employed by IMPACT. The CLAP model used for training 3 is different from the one used for evaluation4 to avoid taking advantage of training and evaluating with the same model. Subjective evaluation is conducted on 90 generated audio samples conditioned on the AudioCaps evaluation set prompts, using the user inter- face and rating criteria defined in AudioBox (Vyas et al., 2023). Each sample receiv"},{"citing_arxiv_id":"2604.16879","ref_index":32,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Adaptive Forensic Feature Refinement via Intrinsic Importance Perception","primary_cat":"cs.CV","submitted_at":"2026-04-18T07:07:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"I2P adaptively selects the most discriminative layers from visual foundation models for synthetic image detection and constrains task updates to low-sensitivity parameter subspaces to improve specificity without harming generalization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.16055","ref_index":33,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving","primary_cat":"cs.CV","submitted_at":"2025-12-18T00:41:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A platform using flow matching for real-world image generation and an adversarial policy creates challenging corner cases to evaluate end-to-end autonomous driving models like UniAD and VAD, showing performance degradation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.06982","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"IntrinsicWeather: Controllable Weather Editing in Intrinsic Space","primary_cat":"cs.CV","submitted_at":"2025-08-09T13:29:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A diffusion framework decomposes images into intrinsic maps via an inverse renderer and renders controllable weather changes via a forward renderer with CLIP prompt interpolation and map-aware attention, outperforming pixel-space baselines on new 38k synthetic and 18k real datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.02242","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sampling-Aware Quantization for Diffusion Models","primary_cat":"cs.CV","submitted_at":"2025-05-04T20:50:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A quantization technique for diffusion models that aligns sampling trajectories to preserve high-order sampler performance under quantization noise.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.12242","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"OmniPrism: Learning Disentangled Visual Concept for Image Generation","primary_cat":"cs.CV","submitted_at":"2024-12-16T18:59:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"OmniPrism proposes a disentanglement method using a new paired dataset (PCD-200K), COD contrastive training, and block embeddings to inject separated concepts into diffusion models for multi-aspect image generation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2308.06721","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models","primary_cat":"cs.CV","submitted_at":"2023-08-13T08:34:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"IP-Adapter adds effective image prompting to text-to-image diffusion models using a lightweight decoupled cross-attention adapter that works alongside text prompts and other controls.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"where x0 represents the real data with an additional condition c, t ∈ [0, T ] denotes the time step of diffusion process, xt = αtx0 + σtϵ is the noisy data at t step, and αt, σt are predefined functions of t that determine the diffusion process. Once the model ϵθ is trained, images can be generated from random noise in an iterative manner. Generally, fast samplers such as DDIM [21], PNDM [36] and DPM-Solver [37, 38], are adopted in the inference stage to accelerate the generation process. For the conditional diffusion models, classifier guidance [23] is a straightforward technique used to balance image fidelity and sample diversity by utilizing gradients from a separately trained classifier. To eliminate the need for training 1https://github."},{"citing_arxiv_id":"2211.01095","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models","primary_cat":"cs.LG","submitted_at":"2022-11-02T13:14:30+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DPM-Solver++ enables high-quality guided sampling of diffusion models in 15-20 steps via data-prediction ODE solving and multistep stabilization.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2210.08402","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LAION-5B: An open large-scale dataset for training next generation image-text models","primary_cat":"cs.CV","submitted_at":"2022-10-16T00:08:18+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LAION-5B is an openly released dataset of 5.85 billion CLIP-filtered image-text pairs that enables replication of foundational vision-language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}