PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Bin Xu; Cheng Xue; Chunhui Liu; Ming Chen; Xiaowei Hu; Zining Fang

arxiv: 2603.22844 · v4 · submitted 2026-03-24 · 💻 cs.AI

PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Zining Fang , Cheng Xue , Chunhui Liu , Bin Xu , Ming Chen , Xiaowei Hu This is my paper

Pith reviewed 2026-05-15 01:19 UTC · model grok-4.3

classification 💻 cs.AI

keywords surgical smoke removaldiffusion modelsrelative policy optimizationphysics-guided rewardsemantic rewardrobotic surgeryimage restoration

0 comments

The pith

PhySe-RPO converts diffusion-based surgical smoke removal into a stochastic policy guided by physics and semantic rewards for consistent restoration under limited data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deterministic diffusion restoration pipelines can be reframed as explorable stochastic policies optimized through group-relative updates. This shift allows the model to incorporate a physics-guided reward enforcing illumination and color consistency alongside a CLIP-derived semantic reward that favors smoke-free and anatomically coherent outputs. A reference-free perceptual constraint further stabilizes results. The resulting framework is tested on both synthetic and real robotic surgical datasets, demonstrating restoration that remains physically plausible and semantically faithful without requiring extensive paired supervision.

Core claim

PhySe-RPO transforms deterministic restoration into a stochastic policy, enabling trajectory-level exploration and critic-free updates via group-relative optimization. A physics-guided reward imposes illumination and color consistency, while a visual-concept semantic reward learned from CLIP-based surgical concepts promotes smoke-free and anatomically coherent restoration.

What carries the argument

Relative Policy Optimization (RPO) that turns the diffusion denoising process into a stochastic policy updated by group-relative rewards combining physics-based illumination consistency and CLIP-derived semantic coherence.

If this is right

Diffusion restoration pipelines gain the ability to explore multiple denoising trajectories instead of committing to a single deterministic path.
Illumination and color fidelity become explicit optimization targets rather than post-hoc checks.
Anatomical coherence can be promoted through semantic concepts without paired clean-smoke image pairs.
The approach supplies a route to robust performance on real robotic surgery data where paired supervision remains scarce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reward structure could be adapted to other medical image restoration problems such as haze removal or low-light enhancement where physical consistency matters.
Uncertainty estimates derived from policy variance might flag regions where restoration is least reliable for surgeon review.
Replacing the CLIP semantic component with domain-specific visual encoders could further tighten anatomical fidelity in specialized procedures.

Load-bearing premise

The combination of physics and CLIP semantic rewards will consistently produce anatomically coherent outputs without introducing artifacts or losing critical details under real surgical lighting and tissue conditions.

What would settle it

Restored real surgical videos that show new color shifts, lost vessel detail, or persistent smoke artifacts after PhySe-RPO application would contradict the claim of reliable physical and semantic consistency.

Figures

Figures reproduced from arXiv: 2603.22844 by Bin Xu, Cheng Xue, Chunhui Liu, Ming Chen, Xiaowei Hu, Zining Fang.

**Figure 2.** Figure 2: Overview of the PhySe-RPO framework. PhySe-RPO refines the pretrained diffusion model through Group-relative Diffusion Policy Optimization, where multiple stochastic trajectories are sampled and optimized using physics-guided color priors, perceptual quality metrics, and semantic rewards, achieving physically consistent and clinically interpretable surgical smoke removal. eration, which naturally supports … view at source ↗

**Figure 3.** Figure 3: Visual-Concept Integration into Diffusion. (a) Learnable visual concepts are trained via contrastive learning to differentiate “clear” and “smoky” concepts in the semantic space. (b) The learned tokens are integrated into the diffusion backbone through multimodal fusion and temporal adaptation to guide semantically consistent desmoking. 3.3. Visual-Concept Semantic Reward While physics-based priors regu… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on real-world surgical smoke images. Compared with prior desmoking methods, PhySe-RPO produces [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Reward convergence analysis. Average reward (a) and reward variance (b) under different semantic reward settings. The Full model converges faster with lower variance than Text-Reward and w/o RVC. of the proposed visual-concept semantic reward with two comparisons. First, [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison on synthetic images,compared to [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Output results for different α values 6.4. Hyperparameter experiments We further investigate two key hyperparameters in our framework: the length of the learning token embedding and the number of groups G used in the PhySe-RPO. The learning token length determines the expressiveness of the learned visual concepts. As shown in [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

read the original abstract

Surgical smoke severely degrades intraoperative video quality, obscuring anatomical structures and limiting surgical perception. Existing learning-based desmoking approaches rely on scarce paired supervision and deterministic restoration pipelines, making it difficult to perform exploration or reinforcement-driven refinement under real surgical conditions. We propose PhySe-RPO, a diffusion restoration framework optimized through Physics- and Semantics-Guided Relative Policy Optimization. The core idea is to transform deterministic restoration into a stochastic policy, enabling trajectory-level exploration and critic-free updates via group-relative optimization. A physics-guided reward imposes illumination and color consistency, while a visual-concept semantic reward learned from CLIP-based surgical concepts promotes smoke-free and anatomically coherent restoration. Together with a reference-free perceptual constraint, PhySe-RPO produces results that are physically consistent, semantically faithful, and clinically interpretable across synthetic and real robotic surgical datasets, providing a principled route to robust diffusion-based restoration under limited paired supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PhySe-RPO reframes diffusion desmoking as a critic-free RL problem with physics and CLIP rewards, which is a clean move for limited-supervision settings but leaves local anatomical fidelity dependent on unproven global signals.

read the letter

The main point is that this paper converts deterministic diffusion restoration into a stochastic policy optimized via group-relative updates, guided by an illumination/color physics reward and a CLIP-based semantic reward for smoke-free anatomy. That combination lets them train without paired clean-smoky images, which is the practical constraint in robotic surgery videos. The reference-free perceptual term is a sensible addition to keep outputs plausible on real data. They report results on both synthetic and real datasets, which is better than many abstract-only claims in this area. The approach is new in how it ties relative policy optimization specifically to these two reward types for this task. It builds on existing diffusion and RL ideas without obvious circularity. The soft spot is exactly the one the stress-test flags: the physics reward works on aggregate histograms and color constancy, so it does not directly penalize spatially varying smoke or subsurface tissue effects. CLIP supplies only coarse concept alignment and carries no explicit edge or structure prior. Because the policy is stochastic and updated without a critic, nothing forces the model to preserve fine vessels or boundaries once the global scores look good. On real unpaired data this assumption is doing a lot of work, and the abstract gives no ablations or per-structure metrics to show it holds. Readers working on medical image restoration or RL for generative vision models will find the framing useful. The idea is concrete enough and the application clear enough that it deserves a serious referee, even if the local fidelity question will need tighter evidence in revision.

Referee Report

2 major / 1 minor

Summary. The paper proposes PhySe-RPO, a diffusion-based surgical smoke removal framework that reformulates deterministic restoration as a stochastic policy optimized via group-relative policy optimization. It introduces a physics-guided reward enforcing illumination and color consistency together with a CLIP-derived semantic reward for smoke-free, anatomically coherent outputs, plus a reference-free perceptual constraint, claiming physically consistent and clinically interpretable results on both synthetic and real robotic surgical datasets under limited paired supervision.

Significance. If the empirical claims are substantiated, the work would offer a principled way to incorporate domain-specific physics and semantic priors into diffusion restoration via critic-free RL, addressing the scarcity of paired data in intraoperative video enhancement and potentially improving robustness for real surgical conditions where smoke obscures critical anatomy.

major comments (2)

[Abstract] Abstract: the central claim that PhySe-RPO 'produces results that are physically consistent, semantically faithful, and clinically interpretable' across datasets is unsupported by any quantitative metrics, ablation studies, tables, or figures in the manuscript text, rendering the effectiveness of the combined reward formulation impossible to evaluate.
[Method] Method description (rewards section): the physics-guided reward is defined on aggregate illumination histograms and color constancy, which does not model spatially varying smoke scattering or subsurface tissue optics; combined with the coarse CLIP semantic term, this leaves the guarantee of local anatomical fidelity (vessels, tissue boundaries) without an explicit penalty for hallucination or erasure once global consistency is met.

minor comments (1)

Notation for the group-relative optimization and the precise weighting between physics, semantic, and perceptual rewards should be formalized with equations and hyper-parameter values for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of clarity and rigor in presenting our claims and method. We address each major comment point-by-point below, indicating the revisions we will incorporate.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that PhySe-RPO 'produces results that are physically consistent, semantically faithful, and clinically interpretable' across datasets is unsupported by any quantitative metrics, ablation studies, tables, or figures in the manuscript text, rendering the effectiveness of the combined reward formulation impossible to evaluate.

Authors: We agree that the abstract claim requires direct linkage to supporting evidence in the manuscript. In the revised version, we will update the abstract to explicitly reference the quantitative metrics (PSNR, SSIM, LPIPS), ablation studies, and visual results from Sections 4 and 5, including tables and figures on both synthetic and real datasets. This will substantiate the claims of physical consistency, semantic fidelity, and clinical interpretability without altering the core contribution. revision: yes
Referee: [Method] Method description (rewards section): the physics-guided reward is defined on aggregate illumination histograms and color constancy, which does not model spatially varying smoke scattering or subsurface tissue optics; combined with the coarse CLIP semantic term, this leaves the guarantee of local anatomical fidelity (vessels, tissue boundaries) without an explicit penalty for hallucination or erasure once global consistency is met.

Authors: We acknowledge the limitation in the reward formulation: the physics-guided component relies on global histogram-based illumination and color constancy for computational efficiency under limited supervision, and the CLIP term provides high-level semantic guidance rather than pixel-level local constraints. This design choice prioritizes robustness in real surgical scenarios where detailed physics models are unavailable. However, the reference-free perceptual constraint and trajectory-level exploration in the group-relative optimization empirically reduce local artifacts, as validated in our real robotic dataset experiments. We will add a new paragraph in the Discussion section explicitly discussing this limitation, including why spatially varying scattering is not modeled, and outline potential future extensions such as incorporating local physics priors. Additional qualitative close-ups of vessel and tissue boundaries will be included to demonstrate local fidelity. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines a new stochastic policy optimization framework (PhySe-RPO) by introducing physics-guided illumination/color rewards and CLIP-derived semantic rewards as independent components applied to diffusion trajectories. These rewards are constructed from external principles and models rather than fitted parameters or self-referential equations. No load-bearing steps reduce predictions to inputs by construction, no self-citation chains justify uniqueness, and no ansatzes or renamings of known results are smuggled in. The central claims rest on the joint sufficiency of the newly specified rewards under limited supervision, which is an independent modeling choice rather than a definitional tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The approach rests on standard assumptions from diffusion models and reinforcement learning plus paper-specific reward designs; no free parameters or new entities are explicitly introduced in the abstract.

axioms (3)

domain assumption Diffusion models can be reframed as stochastic policies for image restoration
Central to converting deterministic restoration into explorable trajectories
domain assumption CLIP embeddings reliably capture surgical visual concepts for reward computation
Used to define the semantic reward promoting anatomically coherent outputs
ad hoc to paper Illumination and color consistency constraints are sufficient physics guidance for smoke removal
Imposed as the physics-guided reward component

pith-pipeline@v0.9.0 · 5466 in / 1394 out tokens · 75434 ms · 2026-05-15T01:19:11.809734+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A physics-guided reward imposes illumination and color consistency, while a visual-concept semantic reward learned from CLIP-based surgical concepts promotes smoke-free and anatomically coherent restoration.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Inter-channel Prior... Intra-channel Prior... RP G = R A + R B

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 3 internal anchors

[1]

The perception-distortion tradeoff

Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6228–6237, 2018. 6

work page 2018
[2]

Lsd3k: A benchmark for smoke removal from laparoscopic surgery images

Wenhui Chang, Yufeng Li, Zebang Zhu, and Yuchen Yang. Lsd3k: A benchmark for smoke removal from laparoscopic surgery images. In2024 3rd International Conference on Ar- tificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC), pages 1–5. IEEE, 2024. 1, 2

work page 2024
[3]

De-smokegcn: generative cooperative net- works for joint surgical smoke detection and removal.IEEE transactions on medical imaging, 39(5):1615–1625, 2019

Long Chen, Wen Tang, Nigel W John, Tao Ruan Wan, and Jian Jun Zhang. De-smokegcn: generative cooperative net- works for joint surgical smoke detection and removal.IEEE transactions on medical imaging, 39(5):1615–1625, 2019. 2

work page 2019
[4]

To- wards self-improvement of diffusion models via group pref- erence optimization.arXiv preprint arXiv:2505.11070,

Renjie Chen, Wenfeng Lin, Yichen Zhang, Jiangchuan Wei, Boyuan Liu, Chao Feng, Jiao Ran, and Mingyu Guo. To- wards self-improvement of diffusion models via group pref- erence optimization.arXiv preprint arXiv:2505.11070,

work page arXiv
[5]

Lightdiff: surgical endoscopic image low-light enhancement with t-diffusion

Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, and Luping Zhou. Lightdiff: surgical endoscopic image low-light enhancement with t-diffusion. InInternational Conference on Medical Im- age Computing and Computer-Assisted Intervention, pages 369–379. Springer, 2024. 1, 2, 6, 7

work page 2024
[6]

Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015

Lark Kwon Choi, Jaehee You, and Alan Conrad Bovik. Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015. 6

work page 2015
[7]

Temporal as a plugin: Unsuper- vised video denoising with pre-trained image denoisers

Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhi- hao Li, and Bihan Wen. Temporal as a plugin: Unsuper- vised video denoising with pre-trained image denoisers. In European Conference on Computer Vision, pages 349–367. Springer, 2024. 1, 2, 6, 7

work page 2024
[8]

Image dehazing transformer with transmission-aware 3d position embedding

Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022. 6, 7, 1

work page 2022
[9]

Improving vision-language-action model with online reinforcement learning.arXiv preprint arXiv:2501.16664,

Yanjiang Guo, Jianke Zhang, Xiaoyu Chen, Xiang Ji, Yen- Jen Wang, Yucheng Hu, and Jianyu Chen. Improving vision- language-action model with online reinforcement learning. arXiv preprint arXiv:2501.16664, 2025. 2

work page arXiv 2025
[10]

Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

work page
[11]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 2

work page 2020
[12]

Cycle-consistent adversarial net- works for smoke detection and removal in endoscopic im- ages

Zhisen Hu and Xiyuan Hu. Cycle-consistent adversarial net- works for smoke detection and removal in endoscopic im- ages. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 3070–3073. IEEE, 2021. 2

work page 2021
[13]

Structure representation network and uncertainty feedback learning for dense non-uniform fog removal

Yeying Jin, Wending Yan, Wenhan Yang, and Robby T Tan. Structure representation network and uncertainty feedback learning for dense non-uniform fog removal. InAsian Con- ference on Computer Vision, pages 155–172. Springer, 2022. 7, 8

work page 2022
[14]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 6

work page 2021
[15]

Denoising as adaptation: Noise- space domain adaptation for image restoration.arXiv preprint arXiv:2406.18516, 2024

Kang Liao, Zongsheng Yue, Zhouxia Wang, and Chen Change Loy. Denoising as adaptation: Noise- space domain adaptation for image restoration.arXiv preprint arXiv:2406.18516, 2024. 1, 2, 6, 7

work page arXiv 2024
[16]

Reasoning physical video generation with diffusion timestep tokens via reinforcement learning.arXiv preprint arXiv:2504.15932, 2025

Wang Lin, Liyu Jia, Wentao Hu, Kaihang Pan, Zhongqi Yue, Wei Zhao, Jingyuan Chen, Fei Wu, and Hanwang Zhang. Reasoning physical video generation with diffusion timestep tokens via reinforcement learning.arXiv preprint arXiv:2504.15932, 2025. 2

work page arXiv 2025
[17]

No-reference image quality assessment based on spatial and spectral entropies.Signal processing: Image communica- tion, 29(8):856–863, 2014

Lixiong Liu, Bao Liu, Hua Huang, and Alan Conrad Bovik. No-reference image quality assessment based on spatial and spectral entropies.Signal processing: Image communica- tion, 29(8):856–863, 2014. 6

work page 2014
[18]

Mixdehazenet: Mix structure block for image dehazing net- work

LiPing Lu, Qian Xiong, Bingrong Xu, and Duanfeng Chu. Mixdehazenet: Mix structure block for image dehazing net- work. In2024 International Joint Conference on Neural Net- works (IJCNN), pages 1–10. IEEE, 2024. 1, 2

work page 2024
[19]

Vision-based surgical field defogging.IEEE transactions on medical imaging, 36(10):2021–2030, 2017

Xiongbiao Luo, A Jonathan McLeod, Stephen E Pautler, Christopher M Schlachta, and Terry M Peters. Vision-based surgical field defogging.IEEE transactions on medical imaging, 36(10):2021–2030, 2017. 8

work page 2021
[20]

arXiv preprint arXiv:2310.01018 , volume=

Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B Sch¨on. Controlling vision-language models for multi-task image restoration.arXiv preprint arXiv:2310.01018, 2023. 1, 2, 6

work page arXiv 2023
[21]

Segment anything in medical images.Nature communications, 15(1):654, 2024

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature communications, 15(1):654, 2024. 8

work page 2024
[22]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 6

work page 2012
[23]

Self-reflective reinforcement learning for diffusion-based image reasoning generation.arXiv preprint arXiv:2505.22407, 2025

Jiadong Pan, Zhiyuan Ma, Kaiyan Zhang, Ning Ding, and Bowen Zhou. Self-reflective reinforcement learning for diffusion-based image reasoning generation.arXiv preprint arXiv:2505.22407, 2025. 2

work page arXiv 2025
[24]

Yirou Pan, Sophia Bano, Francisco Vasconcelos, Hyun Park, Taikyeong Ted Jeong, and Danail Stoyanov. Desmoke-lap: improved unpaired image-to-image translation for desmok- ing in laparoscopic surgery.International Journal of Com- puter Assisted Radiology and Surgery, 17(5):885–893, 2022. 2, 6, 7, 8, 1

work page 2022
[25]

Desmok- ing laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel.IEEE Access, 8:208898–208909, 2020

Sebasti ´an Salazar-Colores, Hugo Moreno Jim ´enez, C´esar Javier Ortiz-Echeverri, and Gerardo Flores. Desmok- ing laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel.IEEE Access, 8:208898–208909, 2020. 2, 8

work page 2020
[26]

Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 6

work page 2016
[27]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024.URL https://arxiv. org/abs/2402.03300, 2(3):5, 2024. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

Amncutter: Affinity-attention-guided multi- view normalized cutter for unsupervised surgical instrument segmentation

Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, and Weidong Cai. Amncutter: Affinity-attention-guided multi- view normalized cutter for unsupervised surgical instrument segmentation. In2025 IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV), pages 4533–4544. IEEE, 2025. 8

work page 2025
[29]

Generative smoke removal

Oleksii Sidorov, Congcong Wang, and Faouzi Alaya Cheikh. Generative smoke removal. InMachine Learning for Health Workshop, pages 81–92. PMLR, 2020. 8

work page 2020
[30]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2011
[31]

Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023. 7, 8

work page 1927
[32]

Multi-stages de-smoking model based on cyclegan for surgical de-smoking.International Journal of Machine Learning and Cybernetics, 14(11): 3965–3978, 2023

Xinpei Su and Qiuxia Wu. Multi-stages de-smoking model based on cyclegan for surgical de-smoking.International Journal of Machine Learning and Cybernetics, 14(11): 3965–3978, 2023. 8

work page 2023
[33]

Unsupervised smoke to desmoked la- paroscopic surgery images using contrast driven cyclic- desmokegan.Computers in Biology and Medicine, 123: 103873, 2020

Vishal Venkatesh, Neeraj Sharma, Vivek Srivastava, and Munendra Singh. Unsupervised smoke to desmoked la- paroscopic surgery images using contrast driven cyclic- desmokegan.Computers in Biology and Medicine, 123: 103873, 2020. 2

work page 2020
[34]

Variational based smoke removal in laparoscopic images.Biomedical engi- neering online, 17(1):139, 2018

Congcong Wang, Faouzi Alaya Cheikh, Mounir Kaaniche, Azeddine Beghdadi, and Ole Jacob Elle. Variational based smoke removal in laparoscopic images.Biomedical engi- neering online, 17(1):139, 2018. 8

work page 2018
[35]

Surgical smoke re- moval via residual swin transformer network.International Journal of Computer Assisted Radiology and Surgery, 18(8): 1417–1427, 2023

Feng Wang, Xinan Sun, and Jinhua Li. Surgical smoke re- moval via residual swin transformer network.International Journal of Computer Assisted Radiology and Surgery, 18(8): 1417–1427, 2023. 2

work page 2023
[36]

Simplear: Pushing the frontier of autoregressive visual generation through pretraining, sft, and rl

Junke Wang, Zhi Tian, Xun Wang, Xinyu Zhang, Weilin Huang, Zuxuan Wu, and Yu-Gang Jiang. Simplear: Pushing the frontier of autoregressive visual generation through pre- training, sft, and rl.arXiv preprint arXiv:2504.11455, 2025. 2

work page arXiv 2025
[37]

Self-supervised video desmoking for laparoscopic surgery

Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, and Wangmeng Zuo. Self-supervised video desmoking for laparoscopic surgery. InEuropean Conference on Computer Vision, pages 307–

work page
[38]

Springer, 2024. 6, 7, 1

work page 2024
[39]

A new benchmark in vivo paired dataset for laparoscopic im- age de-smoking, 2024

Wenyao Xia, Victoria Fan, Terry Peters, and Elvis CS Chen. A new benchmark in vivo paired dataset for laparoscopic im- age de-smoking, 2024. 4, 6

work page 2024
[40]

Td-sam: Temporal and distance-guided adaptations of sam for accurate surgical in- strument segmentation.IEEE Journal of Biomedical and Health Informatics, 2025

Cheng Xue, Shiyu Zhao, Danqiong Wang, Cheng Chen, Guanyu Yang, and Yang Chen. Td-sam: Temporal and distance-guided adaptations of sam for accurate surgical in- strument segmentation.IEEE Journal of Biomedical and Health Informatics, 2025. 8

work page 2025
[41]

No-Reference Quality Assessment of Contrast-Distorted Images using Contrast Enhancement

Jia Yan, Jie Li, and Xin Fu. No-reference quality assess- ment of contrast-distorted images using contrast enhance- ment.arXiv preprint arXiv:1904.08879, 2019. 5

work page internal anchor Pith review Pith/arXiv arXiv 1904
[42]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022. 6

work page 2022
[43]

Seg-r1: Segmentation can be surprisingly simple with reinforcement 33 ConceptSeg-R1 learning

Zuyao You and Zuxuan Wu. Seg-r1: Segmentation can be surprisingly simple with reinforcement learning.arXiv preprint arXiv:2506.22624, 2025. 2

work page arXiv 2025
[44]

Progressive frequency-aware network for laparo- scopic image desmoking

Jiale Zhang, Wenfeng Huang, Xiangyun Liao, and Qiong Wang. Progressive frequency-aware network for laparo- scopic image desmoking. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 479–492. Springer, 2023. 6, 7, 1

work page 2023
[45]

Blind image quality assessment via vision- language correspondence: A multitask learning perspective

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 14071–14081, 2023. 5

work page 2023
[46]

Ef- ficient dual-domain image dehazing with haze prior percep- tion.arXiv preprint arXiv:2507.11035, 2025

Lirong Zheng, Yanshan Li, Rui Yu, and Kaihao Zhang. Ef- ficient dual-domain image dehazing with haze prior percep- tion.arXiv preprint arXiv:2507.11035, 2025. 6, 7, 1

work page arXiv 2025
[47]

Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

work page
[48]

Yichao Zhou, Zhisen Hu, Zuxing Xuan, Yangang Wang, and Xiyuan Hu. Synchronizing detection and removal of smoke in endoscopic images with cyclic consistency adver- sarial nets.IEEE/ACM Transactions on Computational Biol- ogy and Bioinformatics, 21(4):670–680, 2022. 2 PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based S...

work page 2022
[49]

Results on Synthetic Datasets To verify that our model has basic desmoking capabilities at cold start, we tested it on a synthetic dataset

Additional Experimental Results 6.1. Results on Synthetic Datasets To verify that our model has basic desmoking capabilities at cold start, we tested it on a synthetic dataset. The exper- imental results are shown in Table 6,our method achieves the best overall performance across all four evaluation met- rics, demonstrating strong desmoking ability even a...

work page arXiv

[1] [1]

The perception-distortion tradeoff

Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6228–6237, 2018. 6

work page 2018

[2] [2]

Lsd3k: A benchmark for smoke removal from laparoscopic surgery images

Wenhui Chang, Yufeng Li, Zebang Zhu, and Yuchen Yang. Lsd3k: A benchmark for smoke removal from laparoscopic surgery images. In2024 3rd International Conference on Ar- tificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC), pages 1–5. IEEE, 2024. 1, 2

work page 2024

[3] [3]

De-smokegcn: generative cooperative net- works for joint surgical smoke detection and removal.IEEE transactions on medical imaging, 39(5):1615–1625, 2019

Long Chen, Wen Tang, Nigel W John, Tao Ruan Wan, and Jian Jun Zhang. De-smokegcn: generative cooperative net- works for joint surgical smoke detection and removal.IEEE transactions on medical imaging, 39(5):1615–1625, 2019. 2

work page 2019

[4] [4]

To- wards self-improvement of diffusion models via group pref- erence optimization.arXiv preprint arXiv:2505.11070,

Renjie Chen, Wenfeng Lin, Yichen Zhang, Jiangchuan Wei, Boyuan Liu, Chao Feng, Jiao Ran, and Mingyu Guo. To- wards self-improvement of diffusion models via group pref- erence optimization.arXiv preprint arXiv:2505.11070,

work page arXiv

[5] [5]

Lightdiff: surgical endoscopic image low-light enhancement with t-diffusion

Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, and Luping Zhou. Lightdiff: surgical endoscopic image low-light enhancement with t-diffusion. InInternational Conference on Medical Im- age Computing and Computer-Assisted Intervention, pages 369–379. Springer, 2024. 1, 2, 6, 7

work page 2024

[6] [6]

Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015

Lark Kwon Choi, Jaehee You, and Alan Conrad Bovik. Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015. 6

work page 2015

[7] [7]

Temporal as a plugin: Unsuper- vised video denoising with pre-trained image denoisers

Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhi- hao Li, and Bihan Wen. Temporal as a plugin: Unsuper- vised video denoising with pre-trained image denoisers. In European Conference on Computer Vision, pages 349–367. Springer, 2024. 1, 2, 6, 7

work page 2024

[8] [8]

Image dehazing transformer with transmission-aware 3d position embedding

Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022. 6, 7, 1

work page 2022

[9] [9]

Improving vision-language-action model with online reinforcement learning.arXiv preprint arXiv:2501.16664,

Yanjiang Guo, Jianke Zhang, Xiaoyu Chen, Xiang Ji, Yen- Jen Wang, Yucheng Hu, and Jianyu Chen. Improving vision- language-action model with online reinforcement learning. arXiv preprint arXiv:2501.16664, 2025. 2

work page arXiv 2025

[10] [10]

Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

work page

[11] [11]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 2

work page 2020

[12] [12]

Cycle-consistent adversarial net- works for smoke detection and removal in endoscopic im- ages

Zhisen Hu and Xiyuan Hu. Cycle-consistent adversarial net- works for smoke detection and removal in endoscopic im- ages. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 3070–3073. IEEE, 2021. 2

work page 2021

[13] [13]

Structure representation network and uncertainty feedback learning for dense non-uniform fog removal

Yeying Jin, Wending Yan, Wenhan Yang, and Robby T Tan. Structure representation network and uncertainty feedback learning for dense non-uniform fog removal. InAsian Con- ference on Computer Vision, pages 155–172. Springer, 2022. 7, 8

work page 2022

[14] [14]

Musiq: Multi-scale image quality transformer

Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 6

work page 2021

[15] [15]

Denoising as adaptation: Noise- space domain adaptation for image restoration.arXiv preprint arXiv:2406.18516, 2024

Kang Liao, Zongsheng Yue, Zhouxia Wang, and Chen Change Loy. Denoising as adaptation: Noise- space domain adaptation for image restoration.arXiv preprint arXiv:2406.18516, 2024. 1, 2, 6, 7

work page arXiv 2024

[16] [16]

Reasoning physical video generation with diffusion timestep tokens via reinforcement learning.arXiv preprint arXiv:2504.15932, 2025

Wang Lin, Liyu Jia, Wentao Hu, Kaihang Pan, Zhongqi Yue, Wei Zhao, Jingyuan Chen, Fei Wu, and Hanwang Zhang. Reasoning physical video generation with diffusion timestep tokens via reinforcement learning.arXiv preprint arXiv:2504.15932, 2025. 2

work page arXiv 2025

[17] [17]

No-reference image quality assessment based on spatial and spectral entropies.Signal processing: Image communica- tion, 29(8):856–863, 2014

Lixiong Liu, Bao Liu, Hua Huang, and Alan Conrad Bovik. No-reference image quality assessment based on spatial and spectral entropies.Signal processing: Image communica- tion, 29(8):856–863, 2014. 6

work page 2014

[18] [18]

Mixdehazenet: Mix structure block for image dehazing net- work

LiPing Lu, Qian Xiong, Bingrong Xu, and Duanfeng Chu. Mixdehazenet: Mix structure block for image dehazing net- work. In2024 International Joint Conference on Neural Net- works (IJCNN), pages 1–10. IEEE, 2024. 1, 2

work page 2024

[19] [19]

Vision-based surgical field defogging.IEEE transactions on medical imaging, 36(10):2021–2030, 2017

Xiongbiao Luo, A Jonathan McLeod, Stephen E Pautler, Christopher M Schlachta, and Terry M Peters. Vision-based surgical field defogging.IEEE transactions on medical imaging, 36(10):2021–2030, 2017. 8

work page 2021

[20] [20]

arXiv preprint arXiv:2310.01018 , volume=

Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B Sch¨on. Controlling vision-language models for multi-task image restoration.arXiv preprint arXiv:2310.01018, 2023. 1, 2, 6

work page arXiv 2023

[21] [21]

Segment anything in medical images.Nature communications, 15(1):654, 2024

Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature communications, 15(1):654, 2024. 8

work page 2024

[22] [22]

completely blind

Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 6

work page 2012

[23] [23]

Self-reflective reinforcement learning for diffusion-based image reasoning generation.arXiv preprint arXiv:2505.22407, 2025

Jiadong Pan, Zhiyuan Ma, Kaiyan Zhang, Ning Ding, and Bowen Zhou. Self-reflective reinforcement learning for diffusion-based image reasoning generation.arXiv preprint arXiv:2505.22407, 2025. 2

work page arXiv 2025

[24] [24]

Yirou Pan, Sophia Bano, Francisco Vasconcelos, Hyun Park, Taikyeong Ted Jeong, and Danail Stoyanov. Desmoke-lap: improved unpaired image-to-image translation for desmok- ing in laparoscopic surgery.International Journal of Com- puter Assisted Radiology and Surgery, 17(5):885–893, 2022. 2, 6, 7, 8, 1

work page 2022

[25] [25]

Desmok- ing laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel.IEEE Access, 8:208898–208909, 2020

Sebasti ´an Salazar-Colores, Hugo Moreno Jim ´enez, C´esar Javier Ortiz-Echeverri, and Gerardo Flores. Desmok- ing laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel.IEEE Access, 8:208898–208909, 2020. 2, 8

work page 2020

[26] [26]

Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 6

work page 2016

[27] [27]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024.URL https://arxiv. org/abs/2402.03300, 2(3):5, 2024. 1, 2, 3

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

Amncutter: Affinity-attention-guided multi- view normalized cutter for unsupervised surgical instrument segmentation

Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, and Weidong Cai. Amncutter: Affinity-attention-guided multi- view normalized cutter for unsupervised surgical instrument segmentation. In2025 IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV), pages 4533–4544. IEEE, 2025. 8

work page 2025

[29] [29]

Generative smoke removal

Oleksii Sidorov, Congcong Wang, and Faouzi Alaya Cheikh. Generative smoke removal. InMachine Learning for Health Workshop, pages 81–92. PMLR, 2020. 8

work page 2020

[30] [30]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 2

work page internal anchor Pith review Pith/arXiv arXiv 2011

[31] [31]

Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023. 7, 8

work page 1927

[32] [32]

Multi-stages de-smoking model based on cyclegan for surgical de-smoking.International Journal of Machine Learning and Cybernetics, 14(11): 3965–3978, 2023

Xinpei Su and Qiuxia Wu. Multi-stages de-smoking model based on cyclegan for surgical de-smoking.International Journal of Machine Learning and Cybernetics, 14(11): 3965–3978, 2023. 8

work page 2023

[33] [33]

Unsupervised smoke to desmoked la- paroscopic surgery images using contrast driven cyclic- desmokegan.Computers in Biology and Medicine, 123: 103873, 2020

Vishal Venkatesh, Neeraj Sharma, Vivek Srivastava, and Munendra Singh. Unsupervised smoke to desmoked la- paroscopic surgery images using contrast driven cyclic- desmokegan.Computers in Biology and Medicine, 123: 103873, 2020. 2

work page 2020

[34] [34]

Variational based smoke removal in laparoscopic images.Biomedical engi- neering online, 17(1):139, 2018

Congcong Wang, Faouzi Alaya Cheikh, Mounir Kaaniche, Azeddine Beghdadi, and Ole Jacob Elle. Variational based smoke removal in laparoscopic images.Biomedical engi- neering online, 17(1):139, 2018. 8

work page 2018

[35] [35]

Surgical smoke re- moval via residual swin transformer network.International Journal of Computer Assisted Radiology and Surgery, 18(8): 1417–1427, 2023

Feng Wang, Xinan Sun, and Jinhua Li. Surgical smoke re- moval via residual swin transformer network.International Journal of Computer Assisted Radiology and Surgery, 18(8): 1417–1427, 2023. 2

work page 2023

[36] [36]

Simplear: Pushing the frontier of autoregressive visual generation through pretraining, sft, and rl

Junke Wang, Zhi Tian, Xun Wang, Xinyu Zhang, Weilin Huang, Zuxuan Wu, and Yu-Gang Jiang. Simplear: Pushing the frontier of autoregressive visual generation through pre- training, sft, and rl.arXiv preprint arXiv:2504.11455, 2025. 2

work page arXiv 2025

[37] [37]

Self-supervised video desmoking for laparoscopic surgery

Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, and Wangmeng Zuo. Self-supervised video desmoking for laparoscopic surgery. InEuropean Conference on Computer Vision, pages 307–

work page

[38] [38]

Springer, 2024. 6, 7, 1

work page 2024

[39] [39]

A new benchmark in vivo paired dataset for laparoscopic im- age de-smoking, 2024

Wenyao Xia, Victoria Fan, Terry Peters, and Elvis CS Chen. A new benchmark in vivo paired dataset for laparoscopic im- age de-smoking, 2024. 4, 6

work page 2024

[40] [40]

Td-sam: Temporal and distance-guided adaptations of sam for accurate surgical in- strument segmentation.IEEE Journal of Biomedical and Health Informatics, 2025

Cheng Xue, Shiyu Zhao, Danqiong Wang, Cheng Chen, Guanyu Yang, and Yang Chen. Td-sam: Temporal and distance-guided adaptations of sam for accurate surgical in- strument segmentation.IEEE Journal of Biomedical and Health Informatics, 2025. 8

work page 2025

[41] [41]

No-Reference Quality Assessment of Contrast-Distorted Images using Contrast Enhancement

Jia Yan, Jie Li, and Xin Fu. No-reference quality assess- ment of contrast-distorted images using contrast enhance- ment.arXiv preprint arXiv:1904.08879, 2019. 5

work page internal anchor Pith review Pith/arXiv arXiv 1904

[42] [42]

Maniqa: Multi-dimension attention network for no-reference image quality assessment

Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022. 6

work page 2022

[43] [43]

Seg-r1: Segmentation can be surprisingly simple with reinforcement 33 ConceptSeg-R1 learning

Zuyao You and Zuxuan Wu. Seg-r1: Segmentation can be surprisingly simple with reinforcement learning.arXiv preprint arXiv:2506.22624, 2025. 2

work page arXiv 2025

[44] [44]

Progressive frequency-aware network for laparo- scopic image desmoking

Jiale Zhang, Wenfeng Huang, Xiangyun Liao, and Qiong Wang. Progressive frequency-aware network for laparo- scopic image desmoking. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 479–492. Springer, 2023. 6, 7, 1

work page 2023

[45] [45]

Blind image quality assessment via vision- language correspondence: A multitask learning perspective

Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 14071–14081, 2023. 5

work page 2023

[46] [46]

Ef- ficient dual-domain image dehazing with haze prior percep- tion.arXiv preprint arXiv:2507.11035, 2025

Lirong Zheng, Yanshan Li, Rui Yu, and Kaihao Zhang. Ef- ficient dual-domain image dehazing with haze prior percep- tion.arXiv preprint arXiv:2507.11035, 2025. 6, 7, 1

work page arXiv 2025

[47] [47]

Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

work page

[48] [48]

Yichao Zhou, Zhisen Hu, Zuxing Xuan, Yangang Wang, and Xiyuan Hu. Synchronizing detection and removal of smoke in endoscopic images with cyclic consistency adver- sarial nets.IEEE/ACM Transactions on Computational Biol- ogy and Bioinformatics, 21(4):670–680, 2022. 2 PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based S...

work page 2022

[49] [49]

Results on Synthetic Datasets To verify that our model has basic desmoking capabilities at cold start, we tested it on a synthetic dataset

Additional Experimental Results 6.1. Results on Synthetic Datasets To verify that our model has basic desmoking capabilities at cold start, we tested it on a synthetic dataset. The exper- imental results are shown in Table 6,our method achieves the best overall performance across all four evaluation met- rics, demonstrating strong desmoking ability even a...

work page arXiv