On the Controllability-Fidelity Frontier in Diffusion Editing

Emily Davis; Finn Carter; Leying Yi; Yi Hu

arxiv: 2606.09901 · v1 · pith:DQJFOJNInew · submitted 2026-06-05 · 💻 cs.GR · cs.CV· cs.HC· cs.LG· cs.MM

On the Controllability-Fidelity Frontier in Diffusion Editing

Yi Hu , Leying Yi , Emily Davis , Finn Carter This is my paper

Pith reviewed 2026-06-27 20:19 UTC · model grok-4.3

classification 💻 cs.GR cs.CVcs.HCcs.LGcs.MM

keywords diffusion modelsimage editingcontrollabilityfidelitytheoretical boundsfailure modesethical considerations

0 comments

The pith

Diffusion image editing obeys mathematical bounds on reconstruction error, repeated-edit stability, and change locality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper maps the controllability-fidelity frontier by writing explicit mathematical objectives for text-guided, mask-guided, and drag-based edits, then deriving bounds that follow from noise injection, score guidance, and inversion steps. Experiments on current pipelines show these bounds are tight in practice and produce recurring problems such as identity drift and prompt sensitivity. The work also supplies pseudocode frameworks for localized editing and discusses safeguards against misuse. A reader would care because the same limits appear whenever anyone tries to change one part of an image without disturbing the rest.

Core claim

The paper derives mathematical formulations of editing objectives and provides theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes while revealing key failure modes such as identity drift, prompt sensitivity, and compositional errors.

What carries the argument

The controllability-fidelity frontier, constructed from editing objectives, noise dynamics, and bounds on reconstruction error, stability, and locality.

If this is right

Repeated edits accumulate error according to the derived stability bound.
Edits must remain local or the locality bound is violated.
Mask-localized and instruction-guided algorithmic frameworks reduce certain errors.
Concept-erasure techniques are required to limit misuse risks.
Prompt sensitivity and compositional errors are systematic rather than accidental.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bounds may apply to video or 3D diffusion editing with only minor changes to the noise and guidance terms.
Iterative editing workflows could be redesigned to reset inversion error at each step rather than letting it compound.
Compositional errors suggest that current score-guidance formulations under-constrain interactions between multiple objects.
The frontier description supplies a concrete target for training new models that optimize directly against the derived bounds.

Load-bearing premise

That the four tested methods and the chosen metrics are representative enough to characterize the frontier for all diffusion editing pipelines.

What would settle it

A new editing method that stays inside the stated reconstruction-error and stability bounds yet shows none of the listed failure modes on the same tasks would falsify the claimed frontier.

read the original abstract

Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of controllable diffusion-based image editing, analyzing the trade-offs between adherence to user intent, preservation of non-target content, and output quality. Our work spans text- and mask-guided edits, point/drag manipulation, and inversion-based pipelines. We derive mathematical formulations of editing objectives and analyze dynamics of noise injection, score guidance, and inversion error. We provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. We propose algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, and present extensive experiments comparing state-of-the-art methods (e.g.\ TF-ICON \cite{lu2023tficone}, DragFlow \cite{zhou2025dragflow}, InstructPix2Pix \cite{brooks2023instructpix2pix}, UltraEdit \cite{zhao2024ultraedit}) on multiple tasks and metrics (FID, identity similarity, CLIP alignment, artifact scores, etc). Our results reveal key failure modes, such as identity drift, prompt sensitivity, and compositional errors. We also discuss ethical considerations in image editing, including misuse risks, bias, consent, and concept erasure techniques (e.g.\ MACE \cite{lu2024mace}, ANT \cite{li2025ant}, EraseAnything \cite{gao2024eraseanything}) as safeguards. We conclude with best practices and future directions for responsible, high-fidelity diffusion-based editing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a comparative study of four diffusion editing methods with added bounds and failure mode discussion, but the frontier mapping rests on narrow sampling without clear justification for representativeness.

read the letter

The paper's core is a side-by-side look at TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit across text- and mask-guided edits, plus some derived bounds on reconstruction error and repeated-edit stability. It also flags practical issues like identity drift and prompt sensitivity, and includes a section on ethical safeguards such as concept erasure methods.

What stands out is the effort to formalize editing objectives and supply pseudocode for mask-localized and instruction-guided pipelines. The experiments use standard metrics like FID and CLIP alignment, and the ethical discussion ties in recent work on bias and consent. That part feels grounded and useful for people building tools.

The soft spot is the representativeness issue. The stress-test note holds: these four methods are presented as state-of-the-art examples, but the paper does not explain why they cover the main axes of variation in diffusion pipelines, such as different inversion techniques or noise schedules. If other approaches show different stability or error patterns, the bounds and listed failure modes lose force. The abstract frames this as a comprehensive study, yet the selection looks post-hoc rather than systematically justified.

This work is for practitioners who need a quick map of current editing trade-offs and common pitfalls. Readers already familiar with the cited methods will find the synthesis and ethical notes helpful, but it is not aimed at those seeking first-principles advances. It deserves peer review because the empirical comparisons and bounds can be checked directly, even if the generalization claims need tightening.

Referee Report

2 major / 2 minor

Summary. The paper claims to derive mathematical formulations of editing objectives for diffusion-based image editing and provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. It proposes algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, conducts experiments comparing TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit on metrics including FID, identity similarity, CLIP alignment, and artifact scores, identifies failure modes such as identity drift, prompt sensitivity, and compositional errors, and discusses ethical considerations including misuse risks and concept erasure techniques.

Significance. If the derived bounds are rigorous and the empirical characterization of failure modes generalizes, the work could provide useful guidelines for balancing controllability and fidelity in diffusion editing pipelines. The inclusion of pseudocode and explicit discussion of ethical safeguards are positive elements that support reproducibility and responsible use.

major comments (2)

[Experiments (abstract description)] The experiments (as summarized in the abstract) compare only TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit without stated justification that these cover key axes of variation such as mask vs. text guidance, inversion vs. direct editing, or different noise schedules. This selection is load-bearing for the central claim that the derived bounds on reconstruction error and repeated-edit stability, as well as the reported failure modes, characterize the controllability-fidelity frontier across diffusion editing pipelines.
[Theoretical analysis (abstract description)] The abstract asserts derivation of theoretical bounds on reconstruction error and stability, but the provided text does not include the specific equations, assumptions, or proof sketches needed to assess whether these bounds are independent of the chosen methods or hold under the listed failure modes.

minor comments (2)

[Abstract] The abstract contains a citation key 'lu2023tficone' that appears inconsistent with standard BibTeX formatting for the referenced work.
[Experiments (abstract description)] The manuscript would benefit from explicit statements of data exclusion rules and error analysis for the reported metrics, as these are referenced in the abstract but not detailed in the summary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on experimental justification and theoretical presentation. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Experiments (abstract description)] The experiments (as summarized in the abstract) compare only TF-ICON, DragFlow, InstructPix2Pix, and UltraEdit without stated justification that these cover key axes of variation such as mask vs. text guidance, inversion vs. direct editing, or different noise schedules. This selection is load-bearing for the central claim that the derived bounds on reconstruction error and repeated-edit stability, as well as the reported failure modes, characterize the controllability-fidelity frontier across diffusion editing pipelines.

Authors: We agree that an explicit mapping of the selected methods to the key axes is needed to support generalizability of the bounds and failure-mode analysis. The four methods were chosen to span mask-guided (TF-ICON), drag/point manipulation (DragFlow), instruction-based text editing (InstructPix2Pix), and advanced inversion pipelines (UltraEdit). In the revision we will add a dedicated paragraph in the experimental setup section that justifies this selection against the axes listed by the referee and explicitly notes any uncovered variations (e.g., certain noise schedules) as a limitation. revision: yes
Referee: [Theoretical analysis (abstract description)] The abstract asserts derivation of theoretical bounds on reconstruction error and stability, but the provided text does not include the specific equations, assumptions, or proof sketches needed to assess whether these bounds are independent of the chosen methods or hold under the listed failure modes.

Authors: The derivations appear in Section 3 with the core equations and assumptions, and full proofs are in the appendix; however, to improve accessibility we will move the key reconstruction-error and stability bounds (including their assumptions) into the main text together with a concise proof sketch. We will also add a short discussion linking the bounds to the observed failure modes (identity drift, prompt sensitivity) so readers can directly evaluate independence from specific methods. revision: yes

Circularity Check

0 steps flagged

No circularity: derivations and bounds presented as independent from inputs

full rationale

The abstract and provided excerpts describe derivation of editing objective formulations, analysis of noise/score/inversion dynamics, and theoretical bounds on reconstruction error, repeated-edit stability, and locality. These are positioned as first-principles results from the dynamics, not reductions of fitted parameters or self-citations. Empirical sections compare to external cited methods (TF-ICON, DragFlow, InstructPix2Pix, UltraEdit) using standard metrics without evidence that the bounds or failure-mode characterizations are forced by construction from those inputs. Ethical citations (MACE, ANT, EraseAnything) are to distinct authors and function as external references. No load-bearing step reduces to self-definition, renamed known results, or unverified self-citation chains; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract introduces no new free parameters, axioms, or invented entities; relies on standard assumptions from diffusion model literature such as score guidance dynamics and inversion processes.

pith-pipeline@v0.9.1-grok · 5833 in / 1084 out tokens · 23751 ms · 2026-06-27T20:19:09.259848+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

105 extracted references · 7 linked inside Pith

[1]

Ledits++: Limitless image editing using text-to-image models.arXiv preprint arXiv:2311.16711,

Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolin´ario Passos. Ledits++: Limitless image editing using text-to-image models.arXiv preprint arXiv:2311.16711,

arXiv
[2]

Instructpix2pix: Learning to follow image editing instruc- tions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instruc- tions. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 2, 3, 5

2023
[3]

Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing

Mingdeng Cao, Xinya Yang, Zhongang Wang, Changyue Sun, Ying Wu, Dandan Li, and Qingming Huang. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. InIEEE International Conference on Computer Vision (ICCV), 2023. 2

2023
[4]

Offset: Segmentation-based focus shift revision for composed image retrieval

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie. Offset: Segmentation-based focus shift revision for composed image retrieval. InProceedings of the ACM International Conference on Multimedia, page 6113–6122, 2025. 11

2025
[5]

Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. In Proceedings of the ACM International Conference on Multi- media, page 6143–6152, 2025. 11

2025
[6]

Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval

Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20463–20471, 2026. 11

2026
[7]

ILVR: Conditioning method for denoising diffusion probabilistic models

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. ILVR: Conditioning method for denoising diffusion probabilistic models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14347–14356, 2021. 3

2021
[8]

Diffedit: Diffusion-based seman- tic image editing with mask guidance.arXiv preprint arXiv:2210.11427, 2022

Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. Diffedit: Diffusion-based seman- tic image editing with mask guidance.arXiv preprint arXiv:2210.11427, 2022. 2

arXiv 2022
[9]

StableDrag: Stable dragging for point-based image editing

Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, and Limin Wang. StableDrag: Stable dragging for point-based image editing. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 2

2024
[10]

FireFlow: Fast inversion of rec- tified flow for image semantic editing.arXiv preprint arXiv:2412.07517, 2024

Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, and Fan Tang. FireFlow: Fast inversion of rec- tified flow for image semantic editing.arXiv preprint arXiv:2412.07517, 2024. 3, 4, 7, 9

arXiv 2024
[11]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Alexandre Sablayrolles, Matthijs Douze, Herv´e J´egou, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InIEEE International Conference on Computer Vision (ICCV), 2023. 9

2023
[12]

C2pa technical specification

Coalition for Content Provenance and Authenticity (C2PA). C2pa technical specification. Online Specification, 2024. Versioned specification available at the C2PA specifications website. 9

2024
[13]

Pair: Complementarity-guided disentanglement for composed im- age retrieval

Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Pair: Complementarity-guided disentanglement for composed im- age retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 11

2025
[14]

Air-know: Arbiter-calibrated knowledge- internalizing robust network for composed image retrieval,

Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li. Air-know: Arbiter-calibrated knowledge- internalizing robust network for composed image retrieval,
[15]

Erasing concepts from diffusion models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Eliyahu Nachmani, and Amir Globerson. Erasing concepts from diffusion models. InIEEE International Conference on Computer Vision (ICCV), 2023. 3

2023
[16]

Eraseanything: Enabling con- cept erasure in rectified flow transformers.arXiv preprint arXiv:2412.20413, 2024

Daiheng Gao, Shilin Lu, Shaw Walters, Wenbo Zhou, Ji- aming Chu, Jie Zhang, Bang Zhang, Mengxi Jia, Jian Zhao, Zhaoxin Fan, et al. Eraseanything: Enabling con- cept erasure in rectified flow transformers.arXiv preprint arXiv:2412.20413, 2024. 1, 2, 3, 9

arXiv 2024
[17]

Revoking amnesia: Rl-based trajectory optimization to resurrect erased concepts in diffusion models.arXiv preprint arXiv:2510.03302, 2025

Daiheng Gao, Nanxiang Jiang, Andi Zhang, Shilin Lu, Yufei Tang, Wenbo Zhou, Weiming Zhang, and Zhaoxin Fan. Revoking amnesia: Rl-based trajectory optimization to resurrect erased concepts in diffusion models.arXiv preprint arXiv:2510.03302, 2025. 3

arXiv 2025
[18]

Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023

Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxi- ang Tang, Yulun Zhang, Yaowei Wang, and Xiu Li. Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023. 11

2023
[19]

Degradation-resistant unfolding network for heterogeneous image fusion

Chunming He, Kai Li, Guoxia Xu, Yulun Zhang, Runze Hu, Zhenhua Guo, and Xiu Li. Degradation-resistant unfolding network for heterogeneous image fusion. InICCV, pages 12611–12621, 2023. 11

2023
[20]

Camouflaged object detection with feature decomposition and edge reconstruction

Chunming He, Kai Li, Yachao Zhang, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. Camouflaged object detection with feature decomposition and edge reconstruction. InCVPR, pages 22046–22055, 2023. 11

2023
[21]

Weakly- supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS, 36, 2024

Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. Weakly- supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS, 36, 2024. 11

2024
[22]

Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024

Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, and Fisher Yu. Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024. 11

2024
[23]

Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR, 2025

Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, and Sina Farsiu. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR, 2025. 11

2025
[24]

Seg- ment concealed object with incomplete supervision.TPAMI,

Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Longxi- ang Tang, Yulun Zhang, Linghe Kong, and Sina Farsiu. Seg- ment concealed object with incomplete supervision.TPAMI,
[25]

Diffusion models in low-level vision: A survey.TPAMI, 2025

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey.TPAMI, 2025. 11

2025
[26]

Reversible unfolding network for concealed visual perception with generative refinement.arXiv preprint arXiv:2508.15027, 2025

Chunming He, Fengyang Xiao, Rihan Zhang, Chengyu Fang, Deng-Ping Fan, and Sina Farsiu. Reversible unfolding network for concealed visual perception with generative refinement.arXiv preprint arXiv:2508.15027, 2025. 11

arXiv 2025
[27]

Unfoldldm: Deep unfolding-based blind image restoration with latent diffusion priors.arXiv preprint arXiv:2511.18152, 2025

Chunming He, Rihan Zhang, Zheng Chen, Bowen Yang, CHengyu Fang, Yunlong Lin, Fengyang Xiao, and Sina Farsiu. Unfoldldm: Deep unfolding-based blind image restoration with latent diffusion priors.arXiv preprint arXiv:2511.18152, 2025. 11

Pith/arXiv arXiv 2025
[28]

Scaler: Sam- enhanced collaborative learning for label-deficient concealed object segmentation.arXiv preprint arXiv:2511.18136, 2025

Chunming He, Rihan Zhang, Longxiang Tang, Ziyun Yang, Kai Li, Deng-Ping Fan, and Sina Farsiu. Scaler: Sam- enhanced collaborative learning for label-deficient concealed object segmentation.arXiv preprint arXiv:2511.18136, 2025. 11

arXiv 2025
[29]

Run: Reversible unfolding network for concealed object segmentation.ICML, 2025

Chunming He, Rihan Zhang, Fengyang Xiao, Chenyu Fang, Longxiang Tang, Yulun Zhang, Linghe Kong, Deng-Ping Fan, Kai Li, and Sina Farsiu. Run: Reversible unfolding network for concealed object segmentation.ICML, 2025. 11

2025
[30]

Nested unfolding network for real-world concealed object segmentation.arXiv preprint arXiv:2511.18164, 2025

Chunming He, Rihan Zhang, Dingming Zhang, Fengyang Xiao, Deng-Ping Fan, and Sina Farsiu. Nested unfolding network for real-world concealed object segmentation.arXiv preprint arXiv:2511.18164, 2025. 11

arXiv 2025
[31]

Unfoldir: Rethinking deep unfolding network in illumination degrada- tion image restoration.CVPR, 2026

Chunming He, Rihan Zhang, Fengyang Xiao, Chengyu Fang, Longxiang Tang, Yulun Zhang, and Sina Farsiu. Unfoldir: Rethinking deep unfolding network in illumination degrada- tion image restoration.CVPR, 2026. 11

2026
[32]

Refining context- entangled content segmentation via curriculum selection and anti-curriculum promotion.ICML, 2026

Chunming He, Rihan Zhang, Fengyang Xiao, Dingming Zhang, Zhiwen Cao, and Sina Farsiu. Refining context- entangled content segmentation via curriculum selection and anti-curriculum promotion.ICML, 2026. 11

2026
[33]

Prompt-to-prompt im- age editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt im- age editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022. 2

Pith/arXiv arXiv 2022
[34]

GANs trained by a two time-scale update rule converge to a local Nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilib- rium. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6626–6637, 2017. 3, 6

2017
[35]

Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 4

Pith/arXiv arXiv 2022
[36]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 1

2020
[37]

Refine: Composed video retrieval via shared and differential semantics enhancement

Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. Refine: Composed video retrieval via shared and differential semantics enhancement. ACM Transactions on Multimedia Computing, Communica- tions and Applications, 2026. 11

2026
[38]

Median: Adaptive intermediate-grained aggregation network for composed im- age retrieval

Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xue- meng Song, Yupeng Hu, and Liqiang Nie. Median: Adaptive intermediate-grained aggregation network for composed im- age retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 11

2025
[39]

Mquant: Unleashing the inference potential of multimodal large language models via static quantization

Dawei Yang Shuoyu Li Shuo Wang Xing Hu Chen Xu Zukang Xu Changyong Shu JiangYong Yu, Sifan Zhou and Zhihang Yuan. Mquant: Unleashing the inference potential of multimodal large language models via static quantization. InProceedings of the 33rd ACM International Conference on Multimedia, 2025. 11

2025
[40]

Boosting diffusion-based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Boosting diffusion-based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023. 3

arXiv 2023
[41]

Imagic: Text-based real image editing with diffusion models

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

2023
[42]

Patch-wise structural loss for time series forecasting

Dilfira Kudrat, Zongxia Xie, Yanru Sun, Tianyu Jia, and Qinghua Hu. Patch-wise structural loss for time series forecasting. InForty-second International Conference on Machine Learning, 2025. 11

2025
[43]

Set you straight: Auto-steering denoising tra- jectories to sidestep unwanted concepts.arXiv preprint arXiv:2504.12782, 2025

Leyang Li, Shilin Lu, Yan Ren, and Adams Wai-Kin Kong. Set you straight: Auto-steering denoising tra- jectories to sidestep unwanted concepts.arXiv preprint arXiv:2504.12782, 2025. 1, 2, 3, 9

arXiv 2025
[44]

Encoder: Entity mining and modification relation binding for composed image retrieval

Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan. Encoder: Entity mining and modification relation binding for composed image retrieval. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 5101–5109, 2025. 11

2025
[45]

Finecir: Explicit parsing of fine- grained modification semantics for composed image retrieval

Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie. Finecir: Explicit parsing of fine- grained modification semantics for composed image retrieval. https://arxiv.org/abs/2503.21309, 2025. 11

arXiv 2025
[46]

Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval

Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 23373–23381, 2026. 11

2026
[47]

Conesep: Cone-based robust noise-unlearning compositional network for composed image retrieval, 2026

Zixu Li, Yupeng Hu, Zhiwei Chen, Mingyu Zhang, Zhi- heng Fu, and Liqiang Nie. Conesep: Cone-based robust noise-unlearning compositional network for composed image retrieval, 2026. 11

2026
[48]

Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval

Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 6762–6770, 2026. 11

2026
[49]

Tema: Anchor the image, follow the text for multi-modification composed image retrieval, 2026

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, and Liqiang Nie. Tema: Anchor the image, follow the text for multi-modification composed image retrieval, 2026. 11

2026
[50]

Rethinking irregular time series forecasting: A simple yet effective baseline.arXiv preprint arXiv:2505.11250, 2025

Xvyuan Liu, Xiangfei Qiu, Xingjian Wu, Zhengyu Li, Chen- juan Guo, Jilin Hu, and Bin Yang. Rethinking irregular time series forecasting: A simple yet effective baseline.arXiv preprint arXiv:2505.11250, 2025. 10

arXiv 2025
[51]

Tf- icon: Diffusion-based training-free cross-domain image composition

Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. Tf- icon: Diffusion-based training-free cross-domain image composition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2294–2305, 2023. 1, 2, 3, 5

2023
[52]

Mace: Mass concept erasure in diffusion models

Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6430–6440,
[53]

Robust watermarking using generative priors against image editing: From benchmarking to advances

Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. Robust watermarking using generative priors against image editing: From benchmarking to advances. arXiv preprint arXiv:2410.18775, 2024. 3

arXiv 2024
[54]

Does flux already know how to perform physically plausible image composi- tion?arXiv preprint arXiv:2509.21278, 2025

Shilin Lu, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, and Adams Wai-Kin Kong. Does flux already know how to perform physically plausible image composi- tion?arXiv preprint arXiv:2509.21278, 2025. 2

arXiv 2025
[55]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11461–11471, 2022. 2, 3

2022
[56]

Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021. 3, 5

Pith/arXiv arXiv 2021
[57]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

2023
[58]

Langtime: A language-guided unified model for time series forecasting with proximal policy optimization

Wenzhe Niu, Zongxia Xie, Yanru Sun, Wei He, Man Xu, and Chao Hao. Langtime: A language-guided unified model for time series forecasting with proximal policy optimization. In Forty-second International Conference on Machine Learning,
[59]

Drag your GAN: Interactive point-based manipulation on the generative image manifold

Xingang Pan, Ayush Tewari, Thomas Leimk¨ uhler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. Drag your GAN: Interactive point-based manipulation on the generative image manifold. InACM SIGGRAPH 2023 Conference Proceedings, 2023. 2

2023
[60]

Melt: Improve composed image retrieval via the modification frequentation- rarity balance network.arXiv preprint arXiv:2603.29291,

Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhi- heng Fu, Xuemeng Song, and Yupeng Hu. Melt: Improve composed image retrieval via the modification frequentation- rarity balance network.arXiv preprint arXiv:2603.29291,

arXiv
[61]

Jensen, Zhenli Sheng, and Bin Yang

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB: Towards com- prehensive and fair benchmarking of time series forecasting methods. InProc. VLDB Endow., pages 2363–2377, 2024. 10

2024
[62]

A comprehensive survey of deep learning for multivariate time series forecasting: A channel strategy perspective.arXiv preprint arXiv:2502.10721, 2025

Xiangfei Qiu, Hanyin Cheng, Xingjian Wu, Jilin Hu, and Chenjuan Guo. A comprehensive survey of deep learning for multivariate time series forecasting: A channel strategy perspective.arXiv preprint arXiv:2502.10721, 2025

arXiv 2025
[63]

Jensen, and Bin Yang

Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, and Bin Yang. Tab: Unified benchmarking of time series anomaly detection methods. InProc. VLDB Endow., 2025

2025
[64]

Dbloss: Decomposition-based loss function for time series forecast- ing

Xiangfei Qiu, Xingjian Wu, Hanyin Cheng, Xvyuan Liu, Chenjuan Guo, Jilin Hu, and Bin Yang. Dbloss: Decomposition-based loss function for time series forecast- ing. InNeurIPS, 2025

2025
[65]

DUET: Dual clustering enhanced multivariate time series forecasting

Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu, and Bin Yang. DUET: Dual clustering enhanced multivariate time series forecasting. InSIGKDD, pages 1185–1196, 2025

2025
[66]

Dag: A dual causal network for time series forecasting with ex- ogenous variables.arXiv preprint arXiv:2509.14933, 2025

Xiangfei Qiu, Yuhan Zhu, Zhengyu Li, Hanyin Cheng, Xingjian Wu, Chenjuan Guo, Bin Yang, and Jilin Hu. Dag: A dual causal network for time series forecasting with ex- ogenous variables.arXiv preprint arXiv:2509.14933, 2025. 10

Pith/arXiv arXiv 2025
[67]

All that glitters is not gold: Key-secured 3d secrets within 3d gaussian splatting.arXiv preprint arXiv:2503.07191, 2025

Yan Ren, Shilin Lu, and Adams Wai-Kin Kong. All that glitters is not gold: Key-secured 3d secrets within 3d gaussian splatting.arXiv preprint arXiv:2503.07191, 2025. 3

arXiv 2025
[68]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2

2022
[69]

Rectified flow based inversion for real image editing.arXiv preprint arXiv:2410.10792, 2024

Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Cara- manis, Sanjay Shakkottai, and Wen-Sheng Chu. Rectified flow based inversion for real image editing.arXiv preprint arXiv:2410.10792, 2024. 3

arXiv 2024
[70]

Dragdiffusion: Harnessing diffusion models for interactive point-based image editing.arXiv preprint arXiv:2306.14435,

Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y F Tan, and Song Bai. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing.arXiv preprint arXiv:2306.14435,

arXiv
[71]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021. 1, 3, 4

2021
[72]

Solar wind speed prediction with two- dimensional attention mechanism.Space Weather, 19(7): e2020SW002707, 2021

Yanru Sun, Zongxia Xie, Yanhong Chen, Xin Huang, and Qinghua Hu. Solar wind speed prediction with two- dimensional attention mechanism.Space Weather, 19(7): e2020SW002707, 2021. 11

2021
[73]

Accurate solar wind speed prediction with multimodality information.Space: Science & Technology, 2022

Yanru Sun, Zongxia Xie, Yanhong Chen, and Qinghua Hu. Accurate solar wind speed prediction with multimodality information.Space: Science & Technology, 2022

2022
[74]

Adapting llms to time series forecasting via temporal heterogeneity modeling and semantic alignment.arXiv preprint arXiv:2508.07195, 2025

Yanru Sun, Emadeldeen Eldele, Zongxia Xie, Yucheng Wang, Wenzhe Niu, Qinghua Hu, Chee Keong Kwoh, and Min Wu. Adapting llms to time series forecasting via temporal heterogeneity modeling and semantic alignment.arXiv preprint arXiv:2508.07195, 2025

arXiv 2025
[75]

Hierarchical classification auxiliary net- work for time series forecasting

Yanru Sun, Zongxia Xie, Dongyue Chen, Emadeldeen Eldele, and Qinghua Hu. Hierarchical classification auxiliary net- work for time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20743–20751, 2025

2025
[76]

Learning pattern-specific experts for time series forecasting under patch-level distribution shift

Yanru Sun, Zongxia Xie, Emadeldeen Eldele, Dongyue Chen, Qinghua Hu, and Min Wu. Learning pattern-specific experts for time series forecasting under patch-level distribution shift. Advances in Neural Information Processing Systems, 2025

2025
[77]

Ppgf: Probability pattern-guided time series forecasting.IEEE Transactions on Neural Networks and Learning Systems, 2025

Yanru Sun, Zongxia Xie, Haoyu Xing, Hualong Yu, and Qinghua Hu. Ppgf: Probability pattern-guided time series forecasting.IEEE Transactions on Neural Networks and Learning Systems, 2025. 11

2025
[78]

Plug-and-play diffusion features for text-driven image-to- image translation

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven image-to- image translation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3

2023
[79]

K 2V AE: A koopman-kalman enhanced variational autoencoder for probabilistic time series forecasting

Xingjian Wu, Xiangfei Qiu, Hongfan Gao, Jilin Hu, Bin Yang, and Chenjuan Guo. K 2V AE: A koopman-kalman enhanced variational autoencoder for probabilistic time series forecasting. InICML, 2025. 10

2025
[80]

CATCH: Channel-aware multivariate time series anomaly detection via frequency patching

Xingjian Wu, Xiangfei Qiu, Zhengyu Li, Yihang Wang, Jilin Hu, Chenjuan Guo, Hui Xiong, and Bin Yang. CATCH: Channel-aware multivariate time series anomaly detection via frequency patching. InICLR, 2025. 10

2025

Showing first 80 references.

[1] [1]

Ledits++: Limitless image editing using text-to-image models.arXiv preprint arXiv:2311.16711,

Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolin´ario Passos. Ledits++: Limitless image editing using text-to-image models.arXiv preprint arXiv:2311.16711,

arXiv

[2] [2]

Instructpix2pix: Learning to follow image editing instruc- tions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instruc- tions. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 1, 2, 3, 5

2023

[3] [3]

Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing

Mingdeng Cao, Xinya Yang, Zhongang Wang, Changyue Sun, Ying Wu, Dandan Li, and Qingming Huang. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. InIEEE International Conference on Computer Vision (ICCV), 2023. 2

2023

[4] [4]

Offset: Segmentation-based focus shift revision for composed image retrieval

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Xuemeng Song, and Liqiang Nie. Offset: Segmentation-based focus shift revision for composed image retrieval. InProceedings of the ACM International Conference on Multimedia, page 6113–6122, 2025. 11

2025

[5] [5]

Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval

Zhiwei Chen, Yupeng Hu, Zixu Li, Zhiheng Fu, Haokun Wen, and Weili Guan. Hud: Hierarchical uncertainty-aware disambiguation network for composed video retrieval. In Proceedings of the ACM International Conference on Multi- media, page 6143–6152, 2025. 11

2025

[6] [6]

Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval

Zhiwei Chen, Yupeng Hu, Zhiheng Fu, Zixu Li, Jiale Huang, Qinlei Huang, and Yinwei Wei. Intent: Invariance and discrimination-aware noise mitigation for robust composed image retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20463–20471, 2026. 11

2026

[7] [7]

ILVR: Conditioning method for denoising diffusion probabilistic models

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. ILVR: Conditioning method for denoising diffusion probabilistic models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14347–14356, 2021. 3

2021

[8] [8]

Diffedit: Diffusion-based seman- tic image editing with mask guidance.arXiv preprint arXiv:2210.11427, 2022

Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. Diffedit: Diffusion-based seman- tic image editing with mask guidance.arXiv preprint arXiv:2210.11427, 2022. 2

arXiv 2022

[9] [9]

StableDrag: Stable dragging for point-based image editing

Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, and Limin Wang. StableDrag: Stable dragging for point-based image editing. InProceedings of the European Conference on Computer Vision (ECCV), 2024. 2

2024

[10] [10]

FireFlow: Fast inversion of rec- tified flow for image semantic editing.arXiv preprint arXiv:2412.07517, 2024

Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, and Fan Tang. FireFlow: Fast inversion of rec- tified flow for image semantic editing.arXiv preprint arXiv:2412.07517, 2024. 3, 4, 7, 9

arXiv 2024

[11] [11]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Alexandre Sablayrolles, Matthijs Douze, Herv´e J´egou, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InIEEE International Conference on Computer Vision (ICCV), 2023. 9

2023

[12] [12]

C2pa technical specification

Coalition for Content Provenance and Authenticity (C2PA). C2pa technical specification. Online Specification, 2024. Versioned specification available at the C2PA specifications website. 9

2024

[13] [13]

Pair: Complementarity-guided disentanglement for composed im- age retrieval

Zhiheng Fu, Zixu Li, Zhiwei Chen, Chunxiao Wang, Xuemeng Song, Yupeng Hu, and Liqiang Nie. Pair: Complementarity-guided disentanglement for composed im- age retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 11

2025

[14] [14]

Air-know: Arbiter-calibrated knowledge- internalizing robust network for composed image retrieval,

Zhiheng Fu, Yupeng Hu, Qianyun Yang, Shiqi Zhang, Zhiwei Chen, and Zixu Li. Air-know: Arbiter-calibrated knowledge- internalizing robust network for composed image retrieval,

[15] [15]

Erasing concepts from diffusion models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Eliyahu Nachmani, and Amir Globerson. Erasing concepts from diffusion models. InIEEE International Conference on Computer Vision (ICCV), 2023. 3

2023

[16] [16]

Eraseanything: Enabling con- cept erasure in rectified flow transformers.arXiv preprint arXiv:2412.20413, 2024

Daiheng Gao, Shilin Lu, Shaw Walters, Wenbo Zhou, Ji- aming Chu, Jie Zhang, Bang Zhang, Mengxi Jia, Jian Zhao, Zhaoxin Fan, et al. Eraseanything: Enabling con- cept erasure in rectified flow transformers.arXiv preprint arXiv:2412.20413, 2024. 1, 2, 3, 9

arXiv 2024

[17] [17]

Revoking amnesia: Rl-based trajectory optimization to resurrect erased concepts in diffusion models.arXiv preprint arXiv:2510.03302, 2025

Daiheng Gao, Nanxiang Jiang, Andi Zhang, Shilin Lu, Yufei Tang, Wenbo Zhou, Weiming Zhang, and Zhaoxin Fan. Revoking amnesia: Rl-based trajectory optimization to resurrect erased concepts in diffusion models.arXiv preprint arXiv:2510.03302, 2025. 3

arXiv 2025

[18] [18]

Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023

Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxi- ang Tang, Yulun Zhang, Yaowei Wang, and Xiu Li. Hqg-net: Unpaired medical image enhancement with high-quality guid- ance.IEEE Transactions on Neural Networks and Learning Systems, 2023. 11

2023

[19] [19]

Degradation-resistant unfolding network for heterogeneous image fusion

Chunming He, Kai Li, Guoxia Xu, Yulun Zhang, Runze Hu, Zhenhua Guo, and Xiu Li. Degradation-resistant unfolding network for heterogeneous image fusion. InICCV, pages 12611–12621, 2023. 11

2023

[20] [20]

Camouflaged object detection with feature decomposition and edge reconstruction

Chunming He, Kai Li, Yachao Zhang, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. Camouflaged object detection with feature decomposition and edge reconstruction. InCVPR, pages 22046–22055, 2023. 11

2023

[21] [21]

Weakly- supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS, 36, 2024

Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, and Xiu Li. Weakly- supervised concealed object segmentation with sam-based pseudo labeling and multi-scale feature grouping.NeurIPS, 36, 2024. 11

2024

[22] [22]

Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024

Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, and Fisher Yu. Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects.ICLR, 2024. 11

2024

[23] [23]

Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR, 2025

Chunming He, Chengyu Fang, Yulun Zhang, Tian Ye, Kai Li, Longxiang Tang, Zhenhua Guo, Xiu Li, and Sina Farsiu. Reti-diff: Illumination degradation image restoration with retinex-based latent diffusion model.ICLR, 2025. 11

2025

[24] [24]

Seg- ment concealed object with incomplete supervision.TPAMI,

Chunming He, Kai Li, Yachao Zhang, Ziyun Yang, Longxi- ang Tang, Yulun Zhang, Linghe Kong, and Sina Farsiu. Seg- ment concealed object with incomplete supervision.TPAMI,

[25] [25]

Diffusion models in low-level vision: A survey.TPAMI, 2025

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, and Xiu Li. Diffusion models in low-level vision: A survey.TPAMI, 2025. 11

2025

[26] [26]

Reversible unfolding network for concealed visual perception with generative refinement.arXiv preprint arXiv:2508.15027, 2025

Chunming He, Fengyang Xiao, Rihan Zhang, Chengyu Fang, Deng-Ping Fan, and Sina Farsiu. Reversible unfolding network for concealed visual perception with generative refinement.arXiv preprint arXiv:2508.15027, 2025. 11

arXiv 2025

[27] [27]

Unfoldldm: Deep unfolding-based blind image restoration with latent diffusion priors.arXiv preprint arXiv:2511.18152, 2025

Chunming He, Rihan Zhang, Zheng Chen, Bowen Yang, CHengyu Fang, Yunlong Lin, Fengyang Xiao, and Sina Farsiu. Unfoldldm: Deep unfolding-based blind image restoration with latent diffusion priors.arXiv preprint arXiv:2511.18152, 2025. 11

Pith/arXiv arXiv 2025

[28] [28]

Scaler: Sam- enhanced collaborative learning for label-deficient concealed object segmentation.arXiv preprint arXiv:2511.18136, 2025

Chunming He, Rihan Zhang, Longxiang Tang, Ziyun Yang, Kai Li, Deng-Ping Fan, and Sina Farsiu. Scaler: Sam- enhanced collaborative learning for label-deficient concealed object segmentation.arXiv preprint arXiv:2511.18136, 2025. 11

arXiv 2025

[29] [29]

Run: Reversible unfolding network for concealed object segmentation.ICML, 2025

Chunming He, Rihan Zhang, Fengyang Xiao, Chenyu Fang, Longxiang Tang, Yulun Zhang, Linghe Kong, Deng-Ping Fan, Kai Li, and Sina Farsiu. Run: Reversible unfolding network for concealed object segmentation.ICML, 2025. 11

2025

[30] [30]

Nested unfolding network for real-world concealed object segmentation.arXiv preprint arXiv:2511.18164, 2025

Chunming He, Rihan Zhang, Dingming Zhang, Fengyang Xiao, Deng-Ping Fan, and Sina Farsiu. Nested unfolding network for real-world concealed object segmentation.arXiv preprint arXiv:2511.18164, 2025. 11

arXiv 2025

[31] [31]

Unfoldir: Rethinking deep unfolding network in illumination degrada- tion image restoration.CVPR, 2026

Chunming He, Rihan Zhang, Fengyang Xiao, Chengyu Fang, Longxiang Tang, Yulun Zhang, and Sina Farsiu. Unfoldir: Rethinking deep unfolding network in illumination degrada- tion image restoration.CVPR, 2026. 11

2026

[32] [32]

Refining context- entangled content segmentation via curriculum selection and anti-curriculum promotion.ICML, 2026

Chunming He, Rihan Zhang, Fengyang Xiao, Dingming Zhang, Zhiwen Cao, and Sina Farsiu. Refining context- entangled content segmentation via curriculum selection and anti-curriculum promotion.ICML, 2026. 11

2026

[33] [33]

Prompt-to-prompt im- age editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt im- age editing with cross attention control.arXiv preprint arXiv:2208.01626, 2022. 2

Pith/arXiv arXiv 2022

[34] [34]

GANs trained by a two time-scale update rule converge to a local Nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. GANs trained by a two time-scale update rule converge to a local Nash equilib- rium. InAdvances in Neural Information Processing Systems (NeurIPS), pages 6626–6637, 2017. 3, 6

2017

[35] [35]

Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 4

Pith/arXiv arXiv 2022

[36] [36]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), 2020. 1

2020

[37] [37]

Refine: Composed video retrieval via shared and differential semantics enhancement

Yupeng Hu, Zixu Li, Zhiwei Chen, Qinlei Huang, Zhiheng Fu, Mingzhu Xu, and Liqiang Nie. Refine: Composed video retrieval via shared and differential semantics enhancement. ACM Transactions on Multimedia Computing, Communica- tions and Applications, 2026. 11

2026

[38] [38]

Median: Adaptive intermediate-grained aggregation network for composed im- age retrieval

Qinlei Huang, Zhiwei Chen, Zixu Li, Chunxiao Wang, Xue- meng Song, Yupeng Hu, and Liqiang Nie. Median: Adaptive intermediate-grained aggregation network for composed im- age retrieval. InICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2025. 11

2025

[39] [39]

Mquant: Unleashing the inference potential of multimodal large language models via static quantization

Dawei Yang Shuoyu Li Shuo Wang Xing Hu Chen Xu Zukang Xu Changyong Shu JiangYong Yu, Sifan Zhou and Zhihang Yuan. Mquant: Unleashing the inference potential of multimodal large language models via static quantization. InProceedings of the 33rd ACM International Conference on Multimedia, 2025. 11

2025

[40] [40]

Boosting diffusion-based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Boosting diffusion-based editing with 3 lines of code.arXiv preprint arXiv:2310.01506, 2023. 3

arXiv 2023

[41] [41]

Imagic: Text-based real image editing with diffusion models

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

2023

[42] [42]

Patch-wise structural loss for time series forecasting

Dilfira Kudrat, Zongxia Xie, Yanru Sun, Tianyu Jia, and Qinghua Hu. Patch-wise structural loss for time series forecasting. InForty-second International Conference on Machine Learning, 2025. 11

2025

[43] [43]

Set you straight: Auto-steering denoising tra- jectories to sidestep unwanted concepts.arXiv preprint arXiv:2504.12782, 2025

Leyang Li, Shilin Lu, Yan Ren, and Adams Wai-Kin Kong. Set you straight: Auto-steering denoising tra- jectories to sidestep unwanted concepts.arXiv preprint arXiv:2504.12782, 2025. 1, 2, 3, 9

arXiv 2025

[44] [44]

Encoder: Entity mining and modification relation binding for composed image retrieval

Zixu Li, Zhiwei Chen, Haokun Wen, Zhiheng Fu, Yupeng Hu, and Weili Guan. Encoder: Entity mining and modification relation binding for composed image retrieval. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 5101–5109, 2025. 11

2025

[45] [45]

Finecir: Explicit parsing of fine- grained modification semantics for composed image retrieval

Zixu Li, Zhiheng Fu, Yupeng Hu, Zhiwei Chen, Haokun Wen, and Liqiang Nie. Finecir: Explicit parsing of fine- grained modification semantics for composed image retrieval. https://arxiv.org/abs/2503.21309, 2025. 11

arXiv 2025

[46] [46]

Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval

Zixu Li, Yupeng Hu, Zhiwei Chen, Qinlei Huang, Guozhi Qiu, Zhiheng Fu, and Meng Liu. Retrack: Evidence-driven dual-stream directional anchor calibration network for com- posed video retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, pages 23373–23381, 2026. 11

2026

[47] [47]

Conesep: Cone-based robust noise-unlearning compositional network for composed image retrieval, 2026

Zixu Li, Yupeng Hu, Zhiwei Chen, Mingyu Zhang, Zhi- heng Fu, and Liqiang Nie. Conesep: Cone-based robust noise-unlearning compositional network for composed image retrieval, 2026. 11

2026

[48] [48]

Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval

Zixu Li, Yupeng Hu, Zhiwei Chen, Shiqi Zhang, Qinlei Huang, Zhiheng Fu, and Yinwei Wei. Habit: Chrono- synergia robust progressive learning framework for com- posed image retrieval. InProceedings of the AAAI Con- ference on Artificial Intelligence, pages 6762–6770, 2026. 11

2026

[49] [49]

Tema: Anchor the image, follow the text for multi-modification composed image retrieval, 2026

Zixu Li, Yupeng Hu, Zhiheng Fu, Zhiwei Chen, Yongqi Li, and Liqiang Nie. Tema: Anchor the image, follow the text for multi-modification composed image retrieval, 2026. 11

2026

[50] [50]

Rethinking irregular time series forecasting: A simple yet effective baseline.arXiv preprint arXiv:2505.11250, 2025

Xvyuan Liu, Xiangfei Qiu, Xingjian Wu, Zhengyu Li, Chen- juan Guo, Jilin Hu, and Bin Yang. Rethinking irregular time series forecasting: A simple yet effective baseline.arXiv preprint arXiv:2505.11250, 2025. 10

arXiv 2025

[51] [51]

Tf- icon: Diffusion-based training-free cross-domain image composition

Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. Tf- icon: Diffusion-based training-free cross-domain image composition. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2294–2305, 2023. 1, 2, 3, 5

2023

[52] [52]

Mace: Mass concept erasure in diffusion models

Shilin Lu, Zilan Wang, Leyang Li, Yanzhu Liu, and Adams Wai-Kin Kong. Mace: Mass concept erasure in diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6430–6440,

[53] [53]

Robust watermarking using generative priors against image editing: From benchmarking to advances

Shilin Lu, Zihan Zhou, Jiayou Lu, Yuanzhi Zhu, and Adams Wai-Kin Kong. Robust watermarking using generative priors against image editing: From benchmarking to advances. arXiv preprint arXiv:2410.18775, 2024. 3

arXiv 2024

[54] [54]

Does flux already know how to perform physically plausible image composi- tion?arXiv preprint arXiv:2509.21278, 2025

Shilin Lu, Zhuming Lian, Zihan Zhou, Shaocong Zhang, Chen Zhao, and Adams Wai-Kin Kong. Does flux already know how to perform physically plausible image composi- tion?arXiv preprint arXiv:2509.21278, 2025. 2

arXiv 2025

[55] [55]

RePaint: Inpainting using denoising diffusion probabilistic models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using denoising diffusion probabilistic models. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11461–11471, 2022. 2, 3

2022

[56] [56]

Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073, 2021. 3, 5

Pith/arXiv arXiv 2021

[57] [57]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

2023

[58] [58]

Langtime: A language-guided unified model for time series forecasting with proximal policy optimization

Wenzhe Niu, Zongxia Xie, Yanru Sun, Wei He, Man Xu, and Chao Hao. Langtime: A language-guided unified model for time series forecasting with proximal policy optimization. In Forty-second International Conference on Machine Learning,

[59] [59]

Drag your GAN: Interactive point-based manipulation on the generative image manifold

Xingang Pan, Ayush Tewari, Thomas Leimk¨ uhler, Lingjie Liu, Abhimitra Meka, and Christian Theobalt. Drag your GAN: Interactive point-based manipulation on the generative image manifold. InACM SIGGRAPH 2023 Conference Proceedings, 2023. 2

2023

[60] [60]

Melt: Improve composed image retrieval via the modification frequentation- rarity balance network.arXiv preprint arXiv:2603.29291,

Guozhi Qiu, Zhiwei Chen, Zixu Li, Qinlei Huang, Zhi- heng Fu, Xuemeng Song, and Yupeng Hu. Melt: Improve composed image retrieval via the modification frequentation- rarity balance network.arXiv preprint arXiv:2603.29291,

arXiv

[61] [61]

Jensen, Zhenli Sheng, and Bin Yang

Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, and Bin Yang. TFB: Towards com- prehensive and fair benchmarking of time series forecasting methods. InProc. VLDB Endow., pages 2363–2377, 2024. 10

2024

[62] [62]

A comprehensive survey of deep learning for multivariate time series forecasting: A channel strategy perspective.arXiv preprint arXiv:2502.10721, 2025

Xiangfei Qiu, Hanyin Cheng, Xingjian Wu, Jilin Hu, and Chenjuan Guo. A comprehensive survey of deep learning for multivariate time series forecasting: A channel strategy perspective.arXiv preprint arXiv:2502.10721, 2025

arXiv 2025

[63] [63]

Jensen, and Bin Yang

Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, and Bin Yang. Tab: Unified benchmarking of time series anomaly detection methods. InProc. VLDB Endow., 2025

2025

[64] [64]

Dbloss: Decomposition-based loss function for time series forecast- ing

Xiangfei Qiu, Xingjian Wu, Hanyin Cheng, Xvyuan Liu, Chenjuan Guo, Jilin Hu, and Bin Yang. Dbloss: Decomposition-based loss function for time series forecast- ing. InNeurIPS, 2025

2025

[65] [65]

DUET: Dual clustering enhanced multivariate time series forecasting

Xiangfei Qiu, Xingjian Wu, Yan Lin, Chenjuan Guo, Jilin Hu, and Bin Yang. DUET: Dual clustering enhanced multivariate time series forecasting. InSIGKDD, pages 1185–1196, 2025

2025

[66] [66]

Dag: A dual causal network for time series forecasting with ex- ogenous variables.arXiv preprint arXiv:2509.14933, 2025

Xiangfei Qiu, Yuhan Zhu, Zhengyu Li, Hanyin Cheng, Xingjian Wu, Chenjuan Guo, Bin Yang, and Jilin Hu. Dag: A dual causal network for time series forecasting with ex- ogenous variables.arXiv preprint arXiv:2509.14933, 2025. 10

Pith/arXiv arXiv 2025

[67] [67]

All that glitters is not gold: Key-secured 3d secrets within 3d gaussian splatting.arXiv preprint arXiv:2503.07191, 2025

Yan Ren, Shilin Lu, and Adams Wai-Kin Kong. All that glitters is not gold: Key-secured 3d secrets within 3d gaussian splatting.arXiv preprint arXiv:2503.07191, 2025. 3

arXiv 2025

[68] [68]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1, 2

2022

[69] [69]

Rectified flow based inversion for real image editing.arXiv preprint arXiv:2410.10792, 2024

Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Cara- manis, Sanjay Shakkottai, and Wen-Sheng Chu. Rectified flow based inversion for real image editing.arXiv preprint arXiv:2410.10792, 2024. 3

arXiv 2024

[70] [70]

Dragdiffusion: Harnessing diffusion models for interactive point-based image editing.arXiv preprint arXiv:2306.14435,

Yujun Shi, Chuhui Xue, Jun Hao Liew, Jiachun Pan, Hanshu Yan, Wenqing Zhang, Vincent Y F Tan, and Song Bai. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing.arXiv preprint arXiv:2306.14435,

arXiv

[71] [71]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021. 1, 3, 4

2021

[72] [72]

Solar wind speed prediction with two- dimensional attention mechanism.Space Weather, 19(7): e2020SW002707, 2021

Yanru Sun, Zongxia Xie, Yanhong Chen, Xin Huang, and Qinghua Hu. Solar wind speed prediction with two- dimensional attention mechanism.Space Weather, 19(7): e2020SW002707, 2021. 11

2021

[73] [73]

Accurate solar wind speed prediction with multimodality information.Space: Science & Technology, 2022

Yanru Sun, Zongxia Xie, Yanhong Chen, and Qinghua Hu. Accurate solar wind speed prediction with multimodality information.Space: Science & Technology, 2022

2022

[74] [74]

Adapting llms to time series forecasting via temporal heterogeneity modeling and semantic alignment.arXiv preprint arXiv:2508.07195, 2025

Yanru Sun, Emadeldeen Eldele, Zongxia Xie, Yucheng Wang, Wenzhe Niu, Qinghua Hu, Chee Keong Kwoh, and Min Wu. Adapting llms to time series forecasting via temporal heterogeneity modeling and semantic alignment.arXiv preprint arXiv:2508.07195, 2025

arXiv 2025

[75] [75]

Hierarchical classification auxiliary net- work for time series forecasting

Yanru Sun, Zongxia Xie, Dongyue Chen, Emadeldeen Eldele, and Qinghua Hu. Hierarchical classification auxiliary net- work for time series forecasting. InProceedings of the AAAI Conference on Artificial Intelligence, pages 20743–20751, 2025

2025

[76] [76]

Learning pattern-specific experts for time series forecasting under patch-level distribution shift

Yanru Sun, Zongxia Xie, Emadeldeen Eldele, Dongyue Chen, Qinghua Hu, and Min Wu. Learning pattern-specific experts for time series forecasting under patch-level distribution shift. Advances in Neural Information Processing Systems, 2025

2025

[77] [77]

Ppgf: Probability pattern-guided time series forecasting.IEEE Transactions on Neural Networks and Learning Systems, 2025

Yanru Sun, Zongxia Xie, Haoyu Xing, Hualong Yu, and Qinghua Hu. Ppgf: Probability pattern-guided time series forecasting.IEEE Transactions on Neural Networks and Learning Systems, 2025. 11

2025

[78] [78]

Plug-and-play diffusion features for text-driven image-to- image translation

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven image-to- image translation. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 3

2023

[79] [79]

K 2V AE: A koopman-kalman enhanced variational autoencoder for probabilistic time series forecasting

Xingjian Wu, Xiangfei Qiu, Hongfan Gao, Jilin Hu, Bin Yang, and Chenjuan Guo. K 2V AE: A koopman-kalman enhanced variational autoencoder for probabilistic time series forecasting. InICML, 2025. 10

2025

[80] [80]

CATCH: Channel-aware multivariate time series anomaly detection via frequency patching

Xingjian Wu, Xiangfei Qiu, Zhengyu Li, Yihang Wang, Jilin Hu, Chenjuan Guo, Hui Xiong, and Bin Yang. CATCH: Channel-aware multivariate time series anomaly detection via frequency patching. InICLR, 2025. 10

2025