Recognition: 2 theorem links
· Lean TheoremFlashClear: Ultra-Fast Image Content Removal via Efficient Step Distillation and Feature Caching
Pith reviewed 2026-05-13 06:15 UTC · model grok-4.3
The pith
A distilled few-step diffusion model removes objects from images up to 122 times faster than prior methods while keeping visual quality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FlashClear trains a compact diffusion model via Region-aware Adversarial Distillation (RAD) that uses a latent discriminator to concentrate computation on foreground removal regions, then applies Foreground-Prioritized Asymmetric Attention and Caching (FPAC) to reuse features across the few remaining steps. On the OBER benchmark this yields speedups of 8.26 times over ObjectClear and 122 times over OmniPaint while matching or exceeding the base model's ability to erase objects and their visual effects without visible artifacts.
What carries the argument
Region-aware Adversarial Distillation (RAD) that trains a latent discriminator to produce a few-step removal model, combined with training-free Foreground-Prioritized Asymmetric Attention and Caching (FPAC) that prioritizes foreground tokens and reuses cached features in background areas.
If this is right
- Object removal inference time drops enough for real-time editing tools on consumer GPUs.
- The same base model can now process many more images in the same compute budget.
- No additional training or post-processing steps are required to reach the reported speed and quality.
- The approach preserves fidelity in eliminating both the target object and its associated lighting or shadow effects.
Where Pith is reading between the lines
- Similar distillation and caching patterns could accelerate other localized diffusion editing tasks such as inpainting or shadow removal.
- Extending the asymmetric caching across video frames might enable practical video object removal at interactive rates.
- The reduced step count could make these models run acceptably on mobile or edge devices without cloud offload.
Load-bearing premise
The latent discriminator and asymmetric caching will keep removal quality intact and avoid new artifacts on diverse real-world images without any extra fine-tuning.
What would settle it
Apply FlashClear to a new test set of complex scenes containing fine background textures and measure whether PSNR or perceptual quality scores fall below the original ObjectClear baseline or visible artifacts appear.
Figures
read the original abstract
Recently, diffusion-based object removal models have achieved impressive results in eliminating objects and their associated visual effects. However, they indiscriminately denoise all tokens across all timesteps, ignoring that removal usually involves small foreground regions. This strategy introduces substantial computational overhead and prolonged inference times. To overcome this computational burden, we propose a latent discriminator to implement Region-aware Adversarial Distillation (RAD), yielding a highly efficient few-step model named FlashClear. Furthermore, tailored to few-step diffusion models, we propose FPAC (Foreground-Prioritized Asymmetric Attention and Caching), a training-free acceleration strategy. Extensive experiments demonstrate that our framework provides massive acceleration while maintaining or exceeding the performance of our base model, ObjectClear. Notably, on the OBER benchmark, our FlashClear achieves up to 8.26$\times$ and 122$\times$ speedup over ObjectClear and OmniPaint, respectively, while maintaining high visual quality and fidelity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FlashClear for ultra-fast object removal in images. It proposes Region-aware Adversarial Distillation (RAD) that employs a latent discriminator to distill the base ObjectClear diffusion model into a few-step generator, combined with the training-free FPAC module that applies foreground-prioritized asymmetric attention and feature caching. The central empirical claim is that FlashClear delivers up to 8.26× speedup versus ObjectClear and 122× versus OmniPaint on the OBER benchmark while preserving or improving visual quality and fidelity.
Significance. If the quality-preservation claim holds under rigorous testing, the work would be significant for enabling real-time diffusion-based editing pipelines; the training-free character of FPAC and the region-aware distillation strategy constitute clear engineering contributions that could be adopted by follow-on systems.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the headline speedups are reported without any tabulated quantitative quality metrics (PSNR, SSIM, LPIPS, FID), ablation tables, error bars, or statistical significance tests, so the assertion that quality is “maintained or exceeded” cannot be evaluated from the presented evidence.
- [§3.2 and §3.3] §3.2 (RAD) and §3.3 (FPAC): the claim that the latent discriminator yields a few-step model whose outputs match the base model’s fidelity, and that asymmetric caching introduces no visible artifacts on complex lighting/shadows/fine textures, rests on unverified assumptions; no per-region error maps, failure-case analysis, or cross-domain generalization tests are supplied to substantiate this load-bearing premise.
minor comments (2)
- [§3.1 and figures] Figure captions and §3.1 should explicitly define the latent discriminator architecture and the precise caching schedule used in FPAC so that the method is reproducible.
- [§4.1] The OBER benchmark description is missing details on image resolution, object size distribution, and evaluation protocol; these should be added for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comments highlight opportunities to strengthen the empirical validation of our quality claims, and we will revise the manuscript accordingly to address them directly.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the headline speedups are reported without any tabulated quantitative quality metrics (PSNR, SSIM, LPIPS, FID), ablation tables, error bars, or statistical significance tests, so the assertion that quality is “maintained or exceeded” cannot be evaluated from the presented evidence.
Authors: We agree that the current manuscript relies primarily on qualitative visual comparisons to support the claim of maintained or improved quality. In the revised version, we will add a dedicated table in §4 reporting PSNR, SSIM, LPIPS, and FID scores for FlashClear against ObjectClear and OmniPaint on the OBER benchmark. We will also include ablation tables for RAD and FPAC components, error bars from multiple random seeds, and statistical significance tests (e.g., paired t-tests) to enable rigorous evaluation of the quality assertions. The abstract will be updated to reference these quantitative results. revision: yes
-
Referee: [§3.2 and §3.3] §3.2 (RAD) and §3.3 (FPAC): the claim that the latent discriminator yields a few-step model whose outputs match the base model’s fidelity, and that asymmetric caching introduces no visible artifacts on complex lighting/shadows/fine textures, rests on unverified assumptions; no per-region error maps, failure-case analysis, or cross-domain generalization tests are supplied to substantiate this load-bearing premise.
Authors: The manuscript currently supports these claims through side-by-side visual results and qualitative inspection, but we acknowledge that additional quantitative and analytical evidence would make the arguments more robust. In the revision, we will add per-region error maps (highlighting foreground vs. background differences), a dedicated failure-case analysis subsection, and cross-domain generalization experiments on additional datasets beyond OBER to verify fidelity preservation under RAD and artifact-free behavior under FPAC. revision: yes
Circularity Check
No significant circularity; empirical claims rest on external benchmarks
full rationale
The paper introduces RAD (adversarial distillation with latent discriminator) and training-free FPAC (asymmetric attention + caching) as acceleration techniques for a base diffusion removal model. No equations or derivations are presented that reduce by construction to fitted parameters or prior outputs. Speedup and quality claims are supported by direct empirical comparisons on the OBER benchmark against external baselines (ObjectClear, OmniPaint). Self-citation to the base model is present but not load-bearing for any derivation; it serves only as the starting point for measured acceleration. The method is explicitly training-free for FPAC and relies on reported experimental fidelity rather than any self-referential proof or uniqueness theorem.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Region-aware Adversarial Distillation (RAD) ... Foreground-Prioritized Asymmetric Attention and Caching (FPAC) ... 8.26× speedup ... 4-step inference
-
IndisputableMonolith/Foundation/DimensionForcing.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
cross-attention maps ... binary spatial mask M ... asymmetric attention Q' = diag(M)Q
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Blended diffusion for text-driven editing of natural images
Omri Avrahami, Dani Lischinski, and Ohad Fried. Blended diffusion for text-driven editing of natural images. InCVPR, 2022
work page 2022
-
[2]
All are worth words: A vit backbone for diffusion models
Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. All are worth words: A vit backbone for diffusion models. InCVPR, 2023
work page 2023
-
[3]
Token merging for fast stable diffusion
Daniel Bolya and Judy Hoffman. Token merging for fast stable diffusion. InCVPR, 2023
work page 2023
-
[4]
Instructpix2pix: Learning to follow image editing instructions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. InCVPR, 2023
work page 2023
-
[5]
Pengtao Chen, Mingzhu Shen, Peng Ye, Jianjian Cao, Chongjun Tu, Christos-Savvas Bouganis, Yiren Zhao, and Tao Chen. ∆-DiT: A training-free acceleration method tailored for diffusion transformers.arXiv preprint arXiv:2406.01125, 2024
-
[6]
Anydoor: Zero-shot object-level image customization
Xi Chen, Lianghua Huang, Yu Liu, Yujun Shen, Deli Zhao, and Hengshuang Zhao. Anydoor: Zero-shot object-level image customization. InCVPR, 2024
work page 2024
-
[7]
Latentpaint: Image inpainting in latent space with diffusion models
Ciprian Corneanu, Raghudeep Gadde, and Aleix M Martinez. Latentpaint: Image inpainting in latent space with diffusion models. InWACV, 2024
work page 2024
-
[8]
Clipaway: Harmonizing focused embeddings for removing objects via diffusion models
Yi˘git Ekin, Ahmet B Yildirim, Erdem E Caglar, Aykut Erdem, Erkut Erdem, and Aysegul Dundar. Clipaway: Harmonizing focused embeddings for removing objects via diffusion models. InNeurIPS, 2024
work page 2024
-
[9]
Liang Feng, Shikang Zheng, Jiacheng Liu, Yuqi Lin, Qinming Zhou, Peiliang Cai, Xinyu Wang, Junjie Chen, Chang Zou, Yue Ma, et al. HiCache: A plug-in scaled-hermite upgrade for taylor-style cache-then-forecast diffusion acceleration. InICLR, 2026
work page 2026
-
[10]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020
work page 2020
-
[11]
LoRA: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. LoRA: Low-rank adaptation of large language models. InICLR, 2022
work page 2022
-
[12]
Designedit: Multi-layered latent decomposition and fusion for unified & accurate image editing
Yueru Jia, Yuhui Yuan, Aosong Cheng, Chuke Wang, Ji Li, Huizhu Jia, and Shanghang Zhang. Designedit: Multi-layered latent decomposition and fusion for unified & accurate image editing. InAAAI, 2025
work page 2025
-
[13]
Smarteraser: Remove anything from images using masked-region guidance
Longtao Jiang, Zhendong Wang, Jianmin Bao, Wengang Zhou, Dongdong Chen, Lei Shi, Dong Chen, and Houqiang Li. Smarteraser: Remove anything from images using masked-region guidance. InCVPR, 2025
work page 2025
-
[14]
Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion
Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, and Qiang Xu. Brushnet: A plug-and-play image inpainting model with decomposed dual-branch diffusion. InECCV, 2024
work page 2024
-
[15]
Imagic: Text-based real image editing with diffusion models
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. InCVPR, 2023
work page 2023
-
[16]
Bk-sdm: A lightweight, fast, and cheap version of stable diffusion
Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells, and Shinkook Choi. Bk-sdm: A lightweight, fast, and cheap version of stable diffusion. InECCV, 2024
work page 2024
-
[17]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[18]
RORem: Training a robust object remover with human-in-the-loop
Ruibin Li, Tao Yang, Song Guo, and Lei Zhang. RORem: Training a robust object remover with human-in-the-loop. InCVPR, 2025
work page 2025
-
[19]
Snapfusion: Text-to-image diffusion model on mobile devices within two seconds
Yanyu Li, Huan Wang, Qing Jin, Ju Hu, Pavlo Chemerys, Yun Fu, Yanzhi Wang, Sergey Tulyakov, and Jian Ren. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. InNeurIPS, 2023. 10
work page 2023
-
[20]
arXiv preprint arXiv:2402.13929 (2024) 5
Shanchuan Lin, Anran Wang, and Xiao Yang. Sdxl-lightning: Progressive adversarial diffusion distillation.arXiv preprint arXiv:2402.13929, 2024
-
[21]
Structure matters: Tackling the semantic discrepancy in diffusion models for image inpainting
Haipeng Liu, Yang Wang, Biao Qian, Meng Wang, and Yong Rui. Structure matters: Tackling the semantic discrepancy in diffusion models for image inpainting. InCVPR, 2024
work page 2024
-
[22]
From reusing to forecasting: Accelerating diffusion models with taylorseers
Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu, Junjie Chen, and Linfeng Zhang. From reusing to forecasting: Accelerating diffusion models with taylorseers. InICCV, 2025
work page 2025
-
[23]
Pseudo numerical methods for diffusion models on manifolds
Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. Pseudo numerical methods for diffusion models on manifolds. InICLR, 2022
work page 2022
-
[24]
Instaflow: One step is enough for high-quality diffusion-based text-to-image generation
Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, et al. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. InICLR, 2024
work page 2024
-
[25]
Erase diffusion: Empowering object removal through calibrating diffusion pathways
Yi Liu, Hao Zhou, Benlei Cui, Wenxiang Shang, and Ran Lin. Erase diffusion: Empowering object removal through calibrating diffusion pathways. InCVPR, 2025
work page 2025
-
[26]
Simplifying, stabilizing and scaling continuous-time consistency models.ICLR, 2025
Cheng Lu and Yang Song. Simplifying, stabilizing and scaling continuous-time consistency models.ICLR, 2025
work page 2025
-
[27]
Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. InNeurIPS, 2022
work page 2022
-
[28]
Repaint: Inpainting using denoising diffusion probabilistic models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. Repaint: Inpainting using denoising diffusion probabilistic models. InCVPR, 2022
work page 2022
-
[29]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Deepcache: Accelerating diffusion models for free
Xinyin Ma, Gongfan Fang, and Xinchao Wang. Deepcache: Accelerating diffusion models for free. InCVPR, 2024
work page 2024
-
[31]
Hd-painter: high-resolution and prompt-faithful text-guided image inpaint- ing with diffusion models
Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. Hd-painter: high-resolution and prompt-faithful text-guided image inpaint- ing with diffusion models. InICLR, 2025
work page 2025
-
[32]
Sdedit: Guided image synthesis and editing with stochastic differential equations
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022
work page 2022
-
[33]
Swiftbrush: One-step text-to-image diffusion model with variational score distillation
Thuan Hoang Nguyen and Anh Tran. Swiftbrush: One-step text-to-image diffusion model with variational score distillation. InCVPR, 2024
work page 2024
-
[34]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[35]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023
work page 2023
-
[36]
SDXL: Improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. InICLR, 2024
work page 2024
-
[37]
SpotEdit: Selective region editing in diffusion transformers.arXiv preprint arXiv:2512.22323, 2025
Zhibin Qin, Zhenxiong Tan, Zeqing Wang, Songhua Liu, and Xinchao Wang. SpotEdit: Selective region editing in diffusion transformers.arXiv preprint arXiv:2512.22323, 2025
-
[38]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022
work page 2022
-
[39]
RORD: A real-world object removal dataset
Min-Cheol Sagong, Yoon-Jae Yeo, Seung-Won Jung, and Sung-Jea Ko. RORD: A real-world object removal dataset. InBMVC, 2022. 11
work page 2022
-
[40]
Palette: Image-to-image diffusion models
Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. Palette: Image-to-image diffusion models. InACM SIGGRAPH, 2022
work page 2022
-
[41]
Progressive distillation for fast sampling of diffusion models
Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. InICLR, 2022
work page 2022
-
[42]
Adversarial diffusion distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation. InECCV, 2024
work page 2024
-
[43]
Pratheba Selvaraju, Tianyu Ding, Tianyi Chen, Ilya Zharkov, and Luming Liang. Fora: Fast- forward caching in diffusion transformer acceleration.arXiv preprint arXiv:2407.01425, 2024
-
[44]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In ICLR, 2021
work page 2021
-
[45]
Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InICML, 2023
work page 2023
-
[46]
Wenhao Sun, Xue-Mei Dong, Benlei Cui, and Jingqun Tang. Attentive eraser: Unleashing diffusion model’s object removal potential via self-attention redirection guidance. InAAAI, 2025
work page 2025
-
[47]
Resolution-robust large mask inpainting with fourier convolutions
Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. Resolution-robust large mask inpainting with fourier convolutions. InWACV, 2022
work page 2022
-
[48]
Omnieraser: Remove objects and their effects in images with paired video-frame data,
Runpu Wei, Zijin Yin, Shuo Zhang, Lanxiang Zhou, Xueyi Wang, Chao Ban, Tianwei Cao, Hao Sun, Zhongjiang He, Kongming Liang, et al. Omnieraser: Remove objects and their effects in images with paired video-frame data.arXiv preprint arXiv:2501.07397, 2025
-
[49]
ObjectDrop: Bootstrapping counterfactuals for photorealistic object removal and insertion
Daniel Winter, Matan Cohen, Shlomi Fruchter, Yael Pritch, Alex Rav-Acha, and Yedid Hoshen. ObjectDrop: Bootstrapping counterfactuals for photorealistic object removal and insertion. In ECCV, 2024
work page 2024
-
[50]
Junyi Wu, Zhiteng Li, Zheng Hui, Yulun Zhang, Linghe Kong, and Xiaokang Yang. Quantcache: Adaptive importance-guided quantization with hierarchical latent and layer caching for video generation. InICCV, 2025
work page 2025
-
[51]
SmartBrush: Text and shape guided object inpainting with diffusion model
Shaoan Xie, Zhifei Zhang, Zhe Lin, Tobias Hinz, and Kun Zhang. SmartBrush: Text and shape guided object inpainting with diffusion model. InCVPR, 2023
work page 2023
-
[52]
Ufogen: You forward once large scale text-to-image generation via diffusion gans
Yanwu Xu, Yang Zhao, Zhisheng Xiao, and Tingbo Hou. Ufogen: You forward once large scale text-to-image generation via diffusion gans. InCVPR, 2024
work page 2024
-
[53]
Eedit: Rethinking the spatial and temporal redundancy for efficient image editing
Zexuan Yan, Yue Ma, Chang Zou, Wenteng Chen, Qifeng Chen, and Linfeng Zhang. Eedit: Rethinking the spatial and temporal redundancy for efficient image editing. InICCV, 2025
work page 2025
-
[54]
Improved distribution matching distillation for fast image synthesis
Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Fredo Durand, and William T Freeman. Improved distribution matching distillation for fast image synthesis. InNeurIPS, 2024
work page 2024
-
[55]
One-step diffusion with distribution matching distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang, Eli Shechtman, Fredo Durand, William T Freeman, and Taesung Park. One-step diffusion with distribution matching distillation. In CVPR, 2024
work page 2024
-
[56]
WaveFill: A wavelet-based generation network for image inpainting
Yingchen Yu, Fangneng Zhan, Shijian Lu, Jianxiong Pan, Feiying Ma, Xuansong Xie, and Chunyan Miao. WaveFill: A wavelet-based generation network for image inpainting. InICCV, 2021
work page 2021
-
[57]
Omnipaint: Mastering object- oriented editing via disentangled insertion-removal inpainting
Yongsheng Yu, Ziyun Zeng, Haitian Zheng, and Jiebo Luo. Omnipaint: Mastering object- oriented editing via disentangled insertion-removal inpainting. InICCV, 2025
work page 2025
-
[58]
Evelyn Zhang, Jiayi Tang, Xuefei Ning, and Linfeng Zhang. Training-free and hardware-friendly acceleration for diffusion models via similarity-based token pruning. InAAAI, 2025. 12
work page 2025
-
[59]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. InICCV, 2023
work page 2023
-
[60]
The unreason- able effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreason- able effectiveness of deep features as a perceptual metric. InCVPR, 2018
work page 2018
-
[61]
Cross-attention makes inference cumbersome in text-to-image diffusion models
Wentian Zhang, Haozhe Liu, Jinheng Xie, Francesco Faccio, Mike Zheng Shou, and Jürgen Schmidhuber. Cross-attention makes inference cumbersome in text-to-image diffusion models. TMLR, 2025
work page 2025
-
[62]
Precise object and effect removal with adaptive target-aware attention
Jixin Zhao, Zhouxia Wang, Peiqing Yang, and Shangchen Zhou. Precise object and effect removal with adaptive target-aware attention. InCVPR, 2026
work page 2026
-
[63]
Unipc: A unified predictor- corrector framework for fast sampling of diffusion models
Wenliang Zhao, Lujia Bai, Yongming Rao, Jie Zhou, and Jiwen Lu. Unipc: A unified predictor- corrector framework for fast sampling of diffusion models. InNeurIPS, 2023
work page 2023
-
[64]
GeoRemover: Removing objects and their causal visual artifacts
Zixin Zhu, Haoxiang Li, Xuelu Feng, He Wu, Chunming Qiao, and Junsong Yuan. GeoRemover: Removing objects and their causal visual artifacts. InNeurIPS, 2025
work page 2025
-
[65]
A task is worth one word: Learning with task prompts for high-quality versatile image inpainting
Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. InECCV, 2024
work page 2024
-
[66]
Chang Zou, Changlin Li, Yang Li, Patrol Li, Jianbing Wu, Xiao He, Songtao Liu, Zhao Zhong, Kailin Huang, and Linfeng Zhang. DisCa: Accelerating video diffusion transformers with distillation-compatible learnable feature caching. InCVPR, 2026
work page 2026
-
[67]
Accelerating diffusion transformers with token-wise feature caching
Chang Zou, Xuyang Liu, Ting Liu, Siteng Huang, and Linfeng Zhang. Accelerating diffusion transformers with token-wise feature caching. InICLR, 2025. 13 A Overview This supplementary material provides additional details and analyses to complement the main paper. We first describe the implementation details of our proposed method, including the training con...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.