On the Controllability-Fidelity Frontier in Diffusion Editing
Pith reviewed 2026-06-27 20:19 UTC · model grok-4.3
The pith
Diffusion image editing obeys mathematical bounds on reconstruction error, repeated-edit stability, and change locality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper derives mathematical formulations of editing objectives and provides theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes while revealing key failure modes such as identity drift, prompt sensitivity, and compositional errors.
What carries the argument
The controllability-fidelity frontier, constructed from editing objectives, noise dynamics, and bounds on reconstruction error, stability, and locality.
If this is right
- Repeated edits accumulate error according to the derived stability bound.
- Edits must remain local or the locality bound is violated.
- Mask-localized and instruction-guided algorithmic frameworks reduce certain errors.
- Concept-erasure techniques are required to limit misuse risks.
- Prompt sensitivity and compositional errors are systematic rather than accidental.
Where Pith is reading between the lines
- The same bounds may apply to video or 3D diffusion editing with only minor changes to the noise and guidance terms.
- Iterative editing workflows could be redesigned to reset inversion error at each step rather than letting it compound.
- Compositional errors suggest that current score-guidance formulations under-constrain interactions between multiple objects.
- The frontier description supplies a concrete target for training new models that optimize directly against the derived bounds.
Load-bearing premise
That the four tested methods and the chosen metrics are representative enough to characterize the frontier for all diffusion editing pipelines.
What would settle it
A new editing method that stays inside the stated reconstruction-error and stability bounds yet shows none of the listed failure modes on the same tasks would falsify the claimed frontier.
read the original abstract
Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of controllable diffusion-based image editing, analyzing the trade-offs between adherence to user intent, preservation of non-target content, and output quality. Our work spans text- and mask-guided edits, point/drag manipulation, and inversion-based pipelines. We derive mathematical formulations of editing objectives and analyze dynamics of noise injection, score guidance, and inversion error. We provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. We propose algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, and present extensive experiments comparing state-of-the-art methods (e.g.\ TF-ICON \cite{lu2023tficone}, DragFlow \cite{zhou2025dragflow}, InstructPix2Pix \cite{brooks2023instructpix2pix}, UltraEdit \cite{zhao2024ultraedit}) on multiple tasks and metrics (FID, identity similarity, CLIP alignment, artifact scores, etc). Our results reveal key failure modes, such as identity drift, prompt sensitivity, and compositional errors. We also discuss ethical considerations in image editing, including misuse risks, bias, consent, and concept erasure techniques (e.g.\ MACE \cite{lu2024mace}, ANT \cite{li2025ant}, EraseAnything \cite{gao2024eraseanything}) as safeguards. We conclude with best practices and future directions for responsible, high-fidelity diffusion-based editing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to derive mathematical formulations of editing objectives for diffusion-based image editing and provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. It proposes algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, conducts experiments comparing TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit on metrics including FID, identity similarity, CLIP alignment, and artifact scores, identifies failure modes such as identity drift, prompt sensitivity, and compositional errors, and discusses ethical considerations including misuse risks and concept erasure techniques.
Significance. If the derived bounds are rigorous and the empirical characterization of failure modes generalizes, the work could provide useful guidelines for balancing controllability and fidelity in diffusion editing pipelines. The inclusion of pseudocode and explicit discussion of ethical safeguards are positive elements that support reproducibility and responsible use.
major comments (2)
- [Experiments (abstract description)] The experiments (as summarized in the abstract) compare only TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit without stated justification that these cover key axes of variation such as mask vs. text guidance, inversion vs. direct editing, or different noise schedules. This selection is load-bearing for the central claim that the derived bounds on reconstruction error and repeated-edit stability, as well as the reported failure modes, characterize the controllability-fidelity frontier across diffusion editing pipelines.
- [Theoretical analysis (abstract description)] The abstract asserts derivation of theoretical bounds on reconstruction error and stability, but the provided text does not include the specific equations, assumptions, or proof sketches needed to assess whether these bounds are independent of the chosen methods or hold under the listed failure modes.
minor comments (2)
- [Abstract] The abstract contains a citation key 'lu2023tficone' that appears inconsistent with standard BibTeX formatting for the referenced work.
- [Experiments (abstract description)] The manuscript would benefit from explicit statements of data exclusion rules and error analysis for the reported metrics, as these are referenced in the abstract but not detailed in the summary.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on experimental justification and theoretical presentation. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Experiments (abstract description)] The experiments (as summarized in the abstract) compare only TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit without stated justification that these cover key axes of variation such as mask vs. text guidance, inversion vs. direct editing, or different noise schedules. This selection is load-bearing for the central claim that the derived bounds on reconstruction error and repeated-edit stability, as well as the reported failure modes, characterize the controllability-fidelity frontier across diffusion editing pipelines.
Authors: We agree that an explicit mapping of the selected methods to the key axes is needed to support generalizability of the bounds and failure-mode analysis. The four methods were chosen to span mask-guided (TF-ICON), drag/point manipulation (DragFlow), instruction-based text editing (InstructPix2Pix), and advanced inversion pipelines (UltraEdit). In the revision we will add a dedicated paragraph in the experimental setup section that justifies this selection against the axes listed by the referee and explicitly notes any uncovered variations (e.g., certain noise schedules) as a limitation. revision: yes
-
Referee: [Theoretical analysis (abstract description)] The abstract asserts derivation of theoretical bounds on reconstruction error and stability, but the provided text does not include the specific equations, assumptions, or proof sketches needed to assess whether these bounds are independent of the chosen methods or hold under the listed failure modes.
Authors: The derivations appear in Section 3 with the core equations and assumptions, and full proofs are in the appendix; however, to improve accessibility we will move the key reconstruction-error and stability bounds (including their assumptions) into the main text together with a concise proof sketch. We will also add a short discussion linking the bounds to the observed failure modes (identity drift, prompt sensitivity) so readers can directly evaluate independence from specific methods. revision: yes
Circularity Check
No circularity: derivations and bounds presented as independent from inputs
full rationale
The abstract and provided excerpts describe derivation of editing objective formulations, analysis of noise/score/inversion dynamics, and theoretical bounds on reconstruction error, repeated-edit stability, and locality. These are positioned as first-principles results from the dynamics, not reductions of fitted parameters or self-citations. Empirical sections compare to external cited methods (TF-ICON, DragFlow, InstructPix2Pix, UltraEdit) using standard metrics without evidence that the bounds or failure-mode characterizations are forced by construction from those inputs. Ethical citations (MACE, ANT, EraseAnything) are to distinct authors and function as external references. No load-bearing step reduces to self-definition, renamed known results, or unverified self-citation chains; the work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Ledits++: Limitless image editing using text-to-image models.arXiv preprint arXiv:2311.16711,
Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolin´ario Passos. Ledits++: Limitless image editing using text-to-image models.arXiv preprint arXiv:2311.16711,
-
[2]
Instructpix2pix: Learning to follow image editing instruc- tions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instruc- tions. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 2, 3, 5
2023
-
[3]
Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing
Mingdeng Cao, Xinya Yang, Zhongang Wang, Changyue Sun, Ying Wu, Dandan Li, and Qingming Huang. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. InIEEE International Conference on Computer Vision (ICCV), 2023. 2
2023
-
[4]
Offset: Segmentation-based focus shift revision for composed image retrieval
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie. Offset: Segmentation-based focus shift revision for composed image retrieval. InProceedings of the ACM International Conference on Multimedia, page 6113–6122, 2025. 11
2025
-
[5]
Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval
Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. In Proceedings of the ACM International Conference on Multi- media, page 6143–6152, 2025. 11
2025
-
[6]
Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval
Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20463–20471, 2026. 11
2026
-
[7]
ILVR: Conditioning method for denoising diffusion probabilistic models
Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. ILVR: Conditioning method for denoising diffusion probabilistic models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14347–14356, 2021. 3
2021
-
[8]
Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. Diffedit: Diffusion-based seman- tic image editing with mask guidance.arXiv preprint arXiv:2210.11427, 2022. 2
arXiv 2022
-
[9]
StableDrag: Stable dragging for point-based image editing
Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, and Limin Wang. StableDrag: Stable dragging for point-based image editing. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 2
2024
-
[10]
Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, and Fan Tang. FireFlow: Fast inversion of rec- tified flow for image semantic editing.arXiv preprint arXiv:2412.07517, 2024. 3, 4, 7, 9
arXiv 2024
-
[11]
The stable signature: Rooting watermarks in latent diffusion models
Pierre Fernandez, Alexandre Sablayrolles, Matthijs Douze, Herv´e J´egou, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InIEEE International Conference on Computer Vision (ICCV), 2023. 9
2023
-
[12]
C2pa technical specification
Coalition for Content Provenance and Authenticity (C2PA). C2pa technical specification. Online Specification, 2024. Versioned specification available at the C2PA specifications website. 9
2024
-
[13]
Pair: Complementarity-guided disentanglement for composed im- age retrieval
Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Pair: Complementarity-guided disentanglement for composed im- age retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 11
2025
-
[14]
Air-know: Arbiter-calibrated knowledge- internalizing robust network for composed image retrieval,
Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li. Air-know: Arbiter-calibrated knowledge- internalizing robust network for composed image retrieval,
-
[15]
Erasing concepts from diffusion models
Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Eliyahu Nachmani, and Amir Globerson. Erasing concepts from diffusion models. InIEEE International Conference on Computer Vision (ICCV), 2023. 3
2023
-
[16]
Daiheng Gao, Shilin Lu, Shaw Walters, Wenbo Zhou, Ji- aming Chu, Jie Zhang, Bang Zhang, Mengxi Jia, Jian Zhao, Zhaoxin Fan, et al. Eraseanything: Enabling con- cept erasure in rectified flow transformers.arXiv preprint arXiv:2412.20413, 2024. 1, 2, 3, 9
arXiv 2024
-
[17]
Daiheng Gao, Nanxiang Jiang, Andi Zhang, Shilin Lu, Yufei Tang, Wenbo Zhou, Weiming Zhang, and Zhaoxin Fan. Revoking amnesia: Rl-based trajectory optimization to resurrect erased concepts in diffusion models.arXiv preprint arXiv:2510.03302, 2025. 3
arXiv 2025
-
[18]
Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023
Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxi- ang Tang, Yulun Zhang, Yaowei Wang, and Xiu Li. Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023. 11
2023
-
[19]
Degradation-resistant unfolding network for heterogeneous image fusion
Chunming He, Kai Li, Guoxia Xu, Yulun Zhang, Runze Hu, Zhenhua Guo, and Xiu Li. Degradation-resistant unfolding network for heterogeneous image fusion. InICCV, pages 12611–12621, 2023. 11
2023
-
[20]
Camouflaged object detection with feature decomposition and edge reconstruction
Chunming He, Kai Li, Yachao Zhang, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. Camouflaged object detection with feature decomposition and edge reconstruction. InCVPR, pages 22046–22055, 2023. 11
2023
-
[21]
Weakly- supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS, 36, 2024
Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. Weakly- supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS, 36, 2024. 11
2024
-
[22]
Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024
Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, and Fisher Yu. Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024. 11
2024
-
[23]
Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR, 2025
Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, and Sina Farsiu. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR, 2025. 11
2025
-
[24]
Seg- ment concealed object with incomplete supervision.TPAMI,
Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Longxi- ang Tang, Yulun Zhang, Linghe Kong, and Sina Farsiu. Seg- ment concealed object with incomplete supervision.TPAMI,
-
[25]
Diffusion models in low-level vision: A survey.TPAMI, 2025
Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey.TPAMI, 2025. 11
2025
-
[26]
Chunming He, Fengyang Xiao, Rihan Zhang, Chengyu Fang, Deng-Ping Fan, and Sina Farsiu. Reversible unfolding network for concealed visual perception with generative refinement.arXiv preprint arXiv:2508.15027, 2025. 11
arXiv 2025
-
[27]
Chunming He, Rihan Zhang, Zheng Chen, Bowen Yang, CHengyu Fang, Yunlong Lin, Fengyang Xiao, and Sina Farsiu. Unfoldldm: Deep unfolding-based blind image restoration with latent diffusion priors.arXiv preprint arXiv:2511.18152, 2025. 11
Pith/arXiv arXiv 2025
-
[28]
Chunming He, Rihan Zhang, Longxiang Tang, Ziyun Yang, Kai Li, Deng-Ping Fan, and Sina Farsiu. Scaler: Sam- enhanced collaborative learning for label-deficient concealed object segmentation.arXiv preprint arXiv:2511.18136, 2025. 11
arXiv 2025
-
[29]
Run: Reversible unfolding network for concealed object segmentation.ICML, 2025
Chunming He, Rihan Zhang, Fengyang Xiao, Chenyu Fang, Longxiang Tang, Yulun Zhang, Linghe Kong, Deng-Ping Fan, Kai Li, and Sina Farsiu. Run: Reversible unfolding network for concealed object segmentation.ICML, 2025. 11
2025
-
[30]
Chunming He, Rihan Zhang, Dingming Zhang, Fengyang Xiao, Deng-Ping Fan, and Sina Farsiu. Nested unfolding network for real-world concealed object segmentation.arXiv preprint arXiv:2511.18164, 2025. 11
arXiv 2025
-
[31]
Unfoldir: Rethinking deep unfolding network in illumination degrada- tion image restoration.CVPR, 2026
Chunming He, Rihan Zhang, Fengyang Xiao, Chengyu Fang, Longxiang Tang, Yulun Zhang, and Sina Farsiu. Unfoldir: Rethinking deep unfolding network in illumination degrada- tion image restoration.CVPR, 2026. 11
2026
-
[32]
Refining context- entangled content segmentation via curriculum selection and anti-curriculum promotion.ICML, 2026
Chunming He, Rihan Zhang, Fengyang Xiao, Dingming Zhang, Zhiwen Cao, and Sina Farsiu. Refining context- entangled content segmentation via curriculum selection and anti-curriculum promotion.ICML, 2026. 11
2026
-
[33]
Prompt-to-prompt im- age editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt im- age editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022. 2
Pith/arXiv arXiv 2022
-
[34]
GANs trained by a two time-scale update rule converge to a local Nash equilib- rium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilib- rium. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6626–6637, 2017. 3, 6
2017
-
[35]
Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 4
Pith/arXiv arXiv 2022
-
[36]
Denoising diffu- sion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 1
2020
-
[37]
Refine: Composed video retrieval via shared and differential semantics enhancement
Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. Refine: Composed video retrieval via shared and differential semantics enhancement. ACM Transactions on Multimedia Computing, Communica- tions and Applications, 2026. 11
2026
-
[38]
Median: Adaptive intermediate-grained aggregation network for composed im- age retrieval
Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xue- meng Song, Yupeng Hu, and Liqiang Nie. Median: Adaptive intermediate-grained aggregation network for composed im- age retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 11
2025
-
[39]
Mquant: Unleashing the inference potential of multimodal large language models via static quantization
Dawei Yang Shuoyu Li Shuo Wang Xing Hu Chen Xu Zukang Xu Changyong Shu JiangYong Yu, Sifan Zhou and Zhihang Yuan. Mquant: Unleashing the inference potential of multimodal large language models via static quantization. InProceedings of the 33rd ACM International Conference on Multimedia, 2025. 11
2025
-
[40]
Boosting diffusion-based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Boosting diffusion-based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023. 3
arXiv 2023
-
[41]
Imagic: Text-based real image editing with diffusion models
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2
2023
-
[42]
Patch-wise structural loss for time series forecasting
Dilfira Kudrat, Zongxia Xie, Yanru Sun, Tianyu Jia, and Qinghua Hu. Patch-wise structural loss for time series forecasting. InForty-second International Conference on Machine Learning, 2025. 11
2025
-
[43]
Leyang Li, Shilin Lu, Yan Ren, and Adams Wai-Kin Kong. Set you straight: Auto-steering denoising tra- jectories to sidestep unwanted concepts.arXiv preprint arXiv:2504.12782, 2025. 1, 2, 3, 9
arXiv 2025
-
[44]
Encoder: Entity mining and modification relation binding for composed image retrieval
Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan. Encoder: Entity mining and modification relation binding for composed image retrieval. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 5101–5109, 2025. 11
2025
-
[45]
Finecir: Explicit parsing of fine- grained modification semantics for composed image retrieval
Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie. Finecir: Explicit parsing of fine- grained modification semantics for composed image retrieval. https://arxiv.org/abs/2503.21309, 2025. 11
arXiv 2025
-
[46]
Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval
Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 23373–23381, 2026. 11
2026
-
[47]
Conesep: Cone-based robust noise-unlearning compositional network for composed image retrieval, 2026
Zixu Li, Yupeng Hu, Zhiwei Chen, Mingyu Zhang, Zhi- heng Fu, and Liqiang Nie. Conesep: Cone-based robust noise-unlearning compositional network for composed image retrieval, 2026. 11
2026
-
[48]
Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval
Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 6762–6770, 2026. 11
2026
-
[49]
Tema: Anchor the image, follow the text for multi-modification composed image retrieval, 2026
Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, and Liqiang Nie. Tema: Anchor the image, follow the text for multi-modification composed image retrieval, 2026. 11
2026
-
[50]
Xvyuan Liu, Xiangfei Qiu, Xingjian Wu, Zhengyu Li, Chen- juan Guo, Jilin Hu, and Bin Yang. Rethinking irregular time series forecasting: A simple yet effective baseline.arXiv preprint arXiv:2505.11250, 2025. 10
arXiv 2025
-
[51]
Tf- icon: Diffusion-based training-free cross-domain image composition
Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. Tf- icon: Diffusion-based training-free cross-domain image composition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2294–2305, 2023. 1, 2, 3, 5
2023
-
[52]
Mace: Mass concept erasure in diffusion models
Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6430–6440,
-
[53]
Robust watermarking using generative priors against image editing: From benchmarking to advances
Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. Robust watermarking using generative priors against image editing: From benchmarking to advances. arXiv preprint arXiv:2410.18775, 2024. 3
arXiv 2024
-
[54]
Shilin Lu, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, and Adams Wai-Kin Kong. Does flux already know how to perform physically plausible image composi- tion?arXiv preprint arXiv:2509.21278, 2025. 2
arXiv 2025
-
[55]
RePaint: Inpainting using denoising diffusion probabilistic models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11461–11471, 2022. 2, 3
2022
-
[56]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021. 3, 5
Pith/arXiv arXiv 2021
-
[57]
Null-text inversion for editing real images using guided diffusion models
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2
2023
-
[58]
Langtime: A language-guided unified model for time series forecasting with proximal policy optimization
Wenzhe Niu, Zongxia Xie, Yanru Sun, Wei He, Man Xu, and Chao Hao. Langtime: A language-guided unified model for time series forecasting with proximal policy optimization. In Forty-second International Conference on Machine Learning,
-
[59]
Drag your GAN: Interactive point-based manipulation on the generative image manifold
Xingang Pan, Ayush Tewari, Thomas Leimk¨ uhler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. Drag your GAN: Interactive point-based manipulation on the generative image manifold. InACM SIGGRAPH 2023 Conference Proceedings, 2023. 2
2023
-
[60]
Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhi- heng Fu, Xuemeng Song, and Yupeng Hu. Melt: Improve composed image retrieval via the modification frequentation- rarity balance network.arXiv preprint arXiv:2603.29291,
-
[61]
Jensen, Zhenli Sheng, and Bin Yang
Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB: Towards com- prehensive and fair benchmarking of time series forecasting methods. InProc. VLDB Endow., pages 2363–2377, 2024. 10
2024
-
[62]
Xiangfei Qiu, Hanyin Cheng, Xingjian Wu, Jilin Hu, and Chenjuan Guo. A comprehensive survey of deep learning for multivariate time series forecasting: A channel strategy perspective.arXiv preprint arXiv:2502.10721, 2025
arXiv 2025
-
[63]
Jensen, and Bin Yang
Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, and Bin Yang. Tab: Unified benchmarking of time series anomaly detection methods. InProc. VLDB Endow., 2025
2025
-
[64]
Dbloss: Decomposition-based loss function for time series forecast- ing
Xiangfei Qiu, Xingjian Wu, Hanyin Cheng, Xvyuan Liu, Chenjuan Guo, Jilin Hu, and Bin Yang. Dbloss: Decomposition-based loss function for time series forecast- ing. InNeurIPS, 2025
2025
-
[65]
DUET: Dual clustering enhanced multivariate time series forecasting
Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu, and Bin Yang. DUET: Dual clustering enhanced multivariate time series forecasting. InSIGKDD, pages 1185–1196, 2025
2025
-
[66]
Xiangfei Qiu, Yuhan Zhu, Zhengyu Li, Hanyin Cheng, Xingjian Wu, Chenjuan Guo, Bin Yang, and Jilin Hu. Dag: A dual causal network for time series forecasting with ex- ogenous variables.arXiv preprint arXiv:2509.14933, 2025. 10
Pith/arXiv arXiv 2025
-
[67]
Yan Ren, Shilin Lu, and Adams Wai-Kin Kong. All that glitters is not gold: Key-secured 3d secrets within 3d gaussian splatting.arXiv preprint arXiv:2503.07191, 2025. 3
arXiv 2025
-
[68]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2
2022
-
[69]
Rectified flow based inversion for real image editing.arXiv preprint arXiv:2410.10792, 2024
Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Cara- manis, Sanjay Shakkottai, and Wen-Sheng Chu. Rectified flow based inversion for real image editing.arXiv preprint arXiv:2410.10792, 2024. 3
arXiv 2024
-
[70]
Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y F Tan, and Song Bai. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing.arXiv preprint arXiv:2306.14435,
-
[71]
Denoising diffusion implicit models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021. 1, 3, 4
2021
-
[72]
Solar wind speed prediction with two- dimensional attention mechanism.Space Weather, 19(7): e2020SW002707, 2021
Yanru Sun, Zongxia Xie, Yanhong Chen, Xin Huang, and Qinghua Hu. Solar wind speed prediction with two- dimensional attention mechanism.Space Weather, 19(7): e2020SW002707, 2021. 11
2021
-
[73]
Accurate solar wind speed prediction with multimodality information.Space: Science & Technology, 2022
Yanru Sun, Zongxia Xie, Yanhong Chen, and Qinghua Hu. Accurate solar wind speed prediction with multimodality information.Space: Science & Technology, 2022
2022
-
[74]
Yanru Sun, Emadeldeen Eldele, Zongxia Xie, Yucheng Wang, Wenzhe Niu, Qinghua Hu, Chee Keong Kwoh, and Min Wu. Adapting llms to time series forecasting via temporal heterogeneity modeling and semantic alignment.arXiv preprint arXiv:2508.07195, 2025
arXiv 2025
-
[75]
Hierarchical classification auxiliary net- work for time series forecasting
Yanru Sun, Zongxia Xie, Dongyue Chen, Emadeldeen Eldele, and Qinghua Hu. Hierarchical classification auxiliary net- work for time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20743–20751, 2025
2025
-
[76]
Learning pattern-specific experts for time series forecasting under patch-level distribution shift
Yanru Sun, Zongxia Xie, Emadeldeen Eldele, Dongyue Chen, Qinghua Hu, and Min Wu. Learning pattern-specific experts for time series forecasting under patch-level distribution shift. Advances in Neural Information Processing Systems, 2025
2025
-
[77]
Ppgf: Probability pattern-guided time series forecasting.IEEE Transactions on Neural Networks and Learning Systems, 2025
Yanru Sun, Zongxia Xie, Haoyu Xing, Hualong Yu, and Qinghua Hu. Ppgf: Probability pattern-guided time series forecasting.IEEE Transactions on Neural Networks and Learning Systems, 2025. 11
2025
-
[78]
Plug-and-play diffusion features for text-driven image-to- image translation
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven image-to- image translation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3
2023
-
[79]
K 2V AE: A koopman-kalman enhanced variational autoencoder for probabilistic time series forecasting
Xingjian Wu, Xiangfei Qiu, Hongfan Gao, Jilin Hu, Bin Yang, and Chenjuan Guo. K 2V AE: A koopman-kalman enhanced variational autoencoder for probabilistic time series forecasting. InICML, 2025. 10
2025
-
[80]
CATCH: Channel-aware multivariate time series anomaly detection via frequency patching
Xingjian Wu, Xiangfei Qiu, Zhengyu Li, Yihang Wang, Jilin Hu, Chenjuan Guo, Hui Xiong, and Bin Yang. CATCH: Channel-aware multivariate time series anomaly detection via frequency patching. InICLR, 2025. 10
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.