pith. sign in

arxiv: 2603.22844 · v4 · submitted 2026-03-24 · 💻 cs.AI

PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based Surgical Smoke Removal

Pith reviewed 2026-05-15 01:19 UTC · model grok-4.3

classification 💻 cs.AI
keywords surgical smoke removaldiffusion modelsrelative policy optimizationphysics-guided rewardsemantic rewardrobotic surgeryimage restoration
0
0 comments X

The pith

PhySe-RPO converts diffusion-based surgical smoke removal into a stochastic policy guided by physics and semantic rewards for consistent restoration under limited data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deterministic diffusion restoration pipelines can be reframed as explorable stochastic policies optimized through group-relative updates. This shift allows the model to incorporate a physics-guided reward enforcing illumination and color consistency alongside a CLIP-derived semantic reward that favors smoke-free and anatomically coherent outputs. A reference-free perceptual constraint further stabilizes results. The resulting framework is tested on both synthetic and real robotic surgical datasets, demonstrating restoration that remains physically plausible and semantically faithful without requiring extensive paired supervision.

Core claim

PhySe-RPO transforms deterministic restoration into a stochastic policy, enabling trajectory-level exploration and critic-free updates via group-relative optimization. A physics-guided reward imposes illumination and color consistency, while a visual-concept semantic reward learned from CLIP-based surgical concepts promotes smoke-free and anatomically coherent restoration.

What carries the argument

Relative Policy Optimization (RPO) that turns the diffusion denoising process into a stochastic policy updated by group-relative rewards combining physics-based illumination consistency and CLIP-derived semantic coherence.

If this is right

  • Diffusion restoration pipelines gain the ability to explore multiple denoising trajectories instead of committing to a single deterministic path.
  • Illumination and color fidelity become explicit optimization targets rather than post-hoc checks.
  • Anatomical coherence can be promoted through semantic concepts without paired clean-smoke image pairs.
  • The approach supplies a route to robust performance on real robotic surgery data where paired supervision remains scarce.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reward structure could be adapted to other medical image restoration problems such as haze removal or low-light enhancement where physical consistency matters.
  • Uncertainty estimates derived from policy variance might flag regions where restoration is least reliable for surgeon review.
  • Replacing the CLIP semantic component with domain-specific visual encoders could further tighten anatomical fidelity in specialized procedures.

Load-bearing premise

The combination of physics and CLIP semantic rewards will consistently produce anatomically coherent outputs without introducing artifacts or losing critical details under real surgical lighting and tissue conditions.

What would settle it

Restored real surgical videos that show new color shifts, lost vessel detail, or persistent smoke artifacts after PhySe-RPO application would contradict the claim of reliable physical and semantic consistency.

Figures

Figures reproduced from arXiv: 2603.22844 by Bin Xu, Cheng Xue, Chunhui Liu, Ming Chen, Xiaowei Hu, Zining Fang.

Figure 1
Figure 1. Figure 1: Limitations of existing restoration approaches: lack [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the PhySe-RPO framework. PhySe-RPO refines the pretrained diffusion model through Group-relative Diffusion Policy Optimization, where multiple stochastic trajectories are sampled and optimized using physics-guided color priors, perceptual quality metrics, and semantic rewards, achieving physically consistent and clinically interpretable surgical smoke removal. eration, which naturally supports … view at source ↗
Figure 3
Figure 3. Figure 3: Visual-Concept Integration into Diffusion. (a) Learn￾able visual concepts are trained via contrastive learning to dif￾ferentiate “clear” and “smoky” concepts in the semantic space. (b) The learned tokens are integrated into the diffusion backbone through multimodal fusion and temporal adaptation to guide se￾mantically consistent desmoking. 3.3. Visual-Concept Semantic Reward While physics-based priors regu… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison on real-world surgical smoke images. Compared with prior desmoking methods, PhySe-RPO produces [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Reward convergence analysis. Average reward (a) and reward variance (b) under different semantic reward settings. The Full model converges faster with lower variance than Text-Reward and w/o RVC. of the proposed visual-concept semantic reward with two comparisons. First, [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison on synthetic images,compared to [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Output results for different α values 6.4. Hyperparameter experiments We further investigate two key hyperparameters in our framework: the length of the learning token embedding and the number of groups G used in the PhySe-RPO. The learning token length determines the expressiveness of the learned visual concepts. As shown in [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
read the original abstract

Surgical smoke severely degrades intraoperative video quality, obscuring anatomical structures and limiting surgical perception. Existing learning-based desmoking approaches rely on scarce paired supervision and deterministic restoration pipelines, making it difficult to perform exploration or reinforcement-driven refinement under real surgical conditions. We propose PhySe-RPO, a diffusion restoration framework optimized through Physics- and Semantics-Guided Relative Policy Optimization. The core idea is to transform deterministic restoration into a stochastic policy, enabling trajectory-level exploration and critic-free updates via group-relative optimization. A physics-guided reward imposes illumination and color consistency, while a visual-concept semantic reward learned from CLIP-based surgical concepts promotes smoke-free and anatomically coherent restoration. Together with a reference-free perceptual constraint, PhySe-RPO produces results that are physically consistent, semantically faithful, and clinically interpretable across synthetic and real robotic surgical datasets, providing a principled route to robust diffusion-based restoration under limited paired supervision.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes PhySe-RPO, a diffusion-based surgical smoke removal framework that reformulates deterministic restoration as a stochastic policy optimized via group-relative policy optimization. It introduces a physics-guided reward enforcing illumination and color consistency together with a CLIP-derived semantic reward for smoke-free, anatomically coherent outputs, plus a reference-free perceptual constraint, claiming physically consistent and clinically interpretable results on both synthetic and real robotic surgical datasets under limited paired supervision.

Significance. If the empirical claims are substantiated, the work would offer a principled way to incorporate domain-specific physics and semantic priors into diffusion restoration via critic-free RL, addressing the scarcity of paired data in intraoperative video enhancement and potentially improving robustness for real surgical conditions where smoke obscures critical anatomy.

major comments (2)
  1. [Abstract] Abstract: the central claim that PhySe-RPO 'produces results that are physically consistent, semantically faithful, and clinically interpretable' across datasets is unsupported by any quantitative metrics, ablation studies, tables, or figures in the manuscript text, rendering the effectiveness of the combined reward formulation impossible to evaluate.
  2. [Method] Method description (rewards section): the physics-guided reward is defined on aggregate illumination histograms and color constancy, which does not model spatially varying smoke scattering or subsurface tissue optics; combined with the coarse CLIP semantic term, this leaves the guarantee of local anatomical fidelity (vessels, tissue boundaries) without an explicit penalty for hallucination or erasure once global consistency is met.
minor comments (1)
  1. Notation for the group-relative optimization and the precise weighting between physics, semantic, and perceptual rewards should be formalized with equations and hyper-parameter values for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of clarity and rigor in presenting our claims and method. We address each major comment point-by-point below, indicating the revisions we will incorporate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that PhySe-RPO 'produces results that are physically consistent, semantically faithful, and clinically interpretable' across datasets is unsupported by any quantitative metrics, ablation studies, tables, or figures in the manuscript text, rendering the effectiveness of the combined reward formulation impossible to evaluate.

    Authors: We agree that the abstract claim requires direct linkage to supporting evidence in the manuscript. In the revised version, we will update the abstract to explicitly reference the quantitative metrics (PSNR, SSIM, LPIPS), ablation studies, and visual results from Sections 4 and 5, including tables and figures on both synthetic and real datasets. This will substantiate the claims of physical consistency, semantic fidelity, and clinical interpretability without altering the core contribution. revision: yes

  2. Referee: [Method] Method description (rewards section): the physics-guided reward is defined on aggregate illumination histograms and color constancy, which does not model spatially varying smoke scattering or subsurface tissue optics; combined with the coarse CLIP semantic term, this leaves the guarantee of local anatomical fidelity (vessels, tissue boundaries) without an explicit penalty for hallucination or erasure once global consistency is met.

    Authors: We acknowledge the limitation in the reward formulation: the physics-guided component relies on global histogram-based illumination and color constancy for computational efficiency under limited supervision, and the CLIP term provides high-level semantic guidance rather than pixel-level local constraints. This design choice prioritizes robustness in real surgical scenarios where detailed physics models are unavailable. However, the reference-free perceptual constraint and trajectory-level exploration in the group-relative optimization empirically reduce local artifacts, as validated in our real robotic dataset experiments. We will add a new paragraph in the Discussion section explicitly discussing this limitation, including why spatially varying scattering is not modeled, and outline potential future extensions such as incorporating local physics priors. Additional qualitative close-ups of vessel and tissue boundaries will be included to demonstrate local fidelity. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper defines a new stochastic policy optimization framework (PhySe-RPO) by introducing physics-guided illumination/color rewards and CLIP-derived semantic rewards as independent components applied to diffusion trajectories. These rewards are constructed from external principles and models rather than fitted parameters or self-referential equations. No load-bearing steps reduce predictions to inputs by construction, no self-citation chains justify uniqueness, and no ansatzes or renamings of known results are smuggled in. The central claims rest on the joint sufficiency of the newly specified rewards under limited supervision, which is an independent modeling choice rather than a definitional tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The approach rests on standard assumptions from diffusion models and reinforcement learning plus paper-specific reward designs; no free parameters or new entities are explicitly introduced in the abstract.

axioms (3)
  • domain assumption Diffusion models can be reframed as stochastic policies for image restoration
    Central to converting deterministic restoration into explorable trajectories
  • domain assumption CLIP embeddings reliably capture surgical visual concepts for reward computation
    Used to define the semantic reward promoting anatomically coherent outputs
  • ad hoc to paper Illumination and color consistency constraints are sufficient physics guidance for smoke removal
    Imposed as the physics-guided reward component

pith-pipeline@v0.9.0 · 5466 in / 1394 out tokens · 75434 ms · 2026-05-15T01:19:11.809734+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 3 internal anchors

  1. [1]

    The perception-distortion tradeoff

    Yochai Blau and Tomer Michaeli. The perception-distortion tradeoff. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6228–6237, 2018. 6

  2. [2]

    Lsd3k: A benchmark for smoke removal from laparoscopic surgery images

    Wenhui Chang, Yufeng Li, Zebang Zhu, and Yuchen Yang. Lsd3k: A benchmark for smoke removal from laparoscopic surgery images. In2024 3rd International Conference on Ar- tificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC), pages 1–5. IEEE, 2024. 1, 2

  3. [3]

    De-smokegcn: generative cooperative net- works for joint surgical smoke detection and removal.IEEE transactions on medical imaging, 39(5):1615–1625, 2019

    Long Chen, Wen Tang, Nigel W John, Tao Ruan Wan, and Jian Jun Zhang. De-smokegcn: generative cooperative net- works for joint surgical smoke detection and removal.IEEE transactions on medical imaging, 39(5):1615–1625, 2019. 2

  4. [4]

    To- wards self-improvement of diffusion models via group pref- erence optimization.arXiv preprint arXiv:2505.11070,

    Renjie Chen, Wenfeng Lin, Yichen Zhang, Jiangchuan Wei, Boyuan Liu, Chao Feng, Jiao Ran, and Mingyu Guo. To- wards self-improvement of diffusion models via group pref- erence optimization.arXiv preprint arXiv:2505.11070,

  5. [5]

    Lightdiff: surgical endoscopic image low-light enhancement with t-diffusion

    Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, and Luping Zhou. Lightdiff: surgical endoscopic image low-light enhancement with t-diffusion. InInternational Conference on Medical Im- age Computing and Computer-Assisted Intervention, pages 369–379. Springer, 2024. 1, 2, 6, 7

  6. [6]

    Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015

    Lark Kwon Choi, Jaehee You, and Alan Conrad Bovik. Ref- erenceless prediction of perceptual fog density and percep- tual image defogging.IEEE Transactions on Image Process- ing, 24(11):3888–3901, 2015. 6

  7. [7]

    Temporal as a plugin: Unsuper- vised video denoising with pre-trained image denoisers

    Zixuan Fu, Lanqing Guo, Chong Wang, Yufei Wang, Zhi- hao Li, and Bihan Wen. Temporal as a plugin: Unsuper- vised video denoising with pre-trained image denoisers. In European Conference on Computer Vision, pages 349–367. Springer, 2024. 1, 2, 6, 7

  8. [8]

    Image dehazing transformer with transmission-aware 3d position embedding

    Chun-Le Guo, Qixin Yan, Saeed Anwar, Runmin Cong, Wenqi Ren, and Chongyi Li. Image dehazing transformer with transmission-aware 3d position embedding. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5812–5820, 2022. 6, 7, 1

  9. [9]

    Improving vision-language-action model with online reinforcement learning.arXiv preprint arXiv:2501.16664,

    Yanjiang Guo, Jianke Zhang, Xiaoyu Chen, Xiang Ji, Yen- Jen Wang, Yucheng Hu, and Jianyu Chen. Improving vision- language-action model with online reinforcement learning. arXiv preprint arXiv:2501.16664, 2025. 2

  10. [10]

    Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

    Kaiming He, Jian Sun, and Xiaoou Tang. Single image haze removal using dark channel prior.IEEE transactions on pat- tern analysis and machine intelligence, 33(12):2341–2353,

  11. [11]

    Denoising diffu- sion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 2

  12. [12]

    Cycle-consistent adversarial net- works for smoke detection and removal in endoscopic im- ages

    Zhisen Hu and Xiyuan Hu. Cycle-consistent adversarial net- works for smoke detection and removal in endoscopic im- ages. In2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), pages 3070–3073. IEEE, 2021. 2

  13. [13]

    Structure representation network and uncertainty feedback learning for dense non-uniform fog removal

    Yeying Jin, Wending Yan, Wenhan Yang, and Robby T Tan. Structure representation network and uncertainty feedback learning for dense non-uniform fog removal. InAsian Con- ference on Computer Vision, pages 155–172. Springer, 2022. 7, 8

  14. [14]

    Musiq: Multi-scale image quality transformer

    Junjie Ke, Qifei Wang, Yilin Wang, Peyman Milanfar, and Feng Yang. Musiq: Multi-scale image quality transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 5148–5157, 2021. 6

  15. [15]

    Denoising as adaptation: Noise- space domain adaptation for image restoration.arXiv preprint arXiv:2406.18516, 2024

    Kang Liao, Zongsheng Yue, Zhouxia Wang, and Chen Change Loy. Denoising as adaptation: Noise- space domain adaptation for image restoration.arXiv preprint arXiv:2406.18516, 2024. 1, 2, 6, 7

  16. [16]

    Reasoning physical video generation with diffusion timestep tokens via reinforcement learning.arXiv preprint arXiv:2504.15932, 2025

    Wang Lin, Liyu Jia, Wentao Hu, Kaihang Pan, Zhongqi Yue, Wei Zhao, Jingyuan Chen, Fei Wu, and Hanwang Zhang. Reasoning physical video generation with diffusion timestep tokens via reinforcement learning.arXiv preprint arXiv:2504.15932, 2025. 2

  17. [17]

    No-reference image quality assessment based on spatial and spectral entropies.Signal processing: Image communica- tion, 29(8):856–863, 2014

    Lixiong Liu, Bao Liu, Hua Huang, and Alan Conrad Bovik. No-reference image quality assessment based on spatial and spectral entropies.Signal processing: Image communica- tion, 29(8):856–863, 2014. 6

  18. [18]

    Mixdehazenet: Mix structure block for image dehazing net- work

    LiPing Lu, Qian Xiong, Bingrong Xu, and Duanfeng Chu. Mixdehazenet: Mix structure block for image dehazing net- work. In2024 International Joint Conference on Neural Net- works (IJCNN), pages 1–10. IEEE, 2024. 1, 2

  19. [19]

    Vision-based surgical field defogging.IEEE transactions on medical imaging, 36(10):2021–2030, 2017

    Xiongbiao Luo, A Jonathan McLeod, Stephen E Pautler, Christopher M Schlachta, and Terry M Peters. Vision-based surgical field defogging.IEEE transactions on medical imaging, 36(10):2021–2030, 2017. 8

  20. [20]

    arXiv preprint arXiv:2310.01018 , volume=

    Ziwei Luo, Fredrik K Gustafsson, Zheng Zhao, Jens Sj¨olund, and Thomas B Sch¨on. Controlling vision-language models for multi-task image restoration.arXiv preprint arXiv:2310.01018, 2023. 1, 2, 6

  21. [21]

    Segment anything in medical images.Nature communications, 15(1):654, 2024

    Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. Segment anything in medical images.Nature communications, 15(1):654, 2024. 8

  22. [22]

    completely blind

    Anish Mittal, Rajiv Soundararajan, and Alan C Bovik. Mak- ing a “completely blind” image quality analyzer.IEEE Sig- nal processing letters, 20(3):209–212, 2012. 6

  23. [23]

    Self-reflective reinforcement learning for diffusion-based image reasoning generation.arXiv preprint arXiv:2505.22407, 2025

    Jiadong Pan, Zhiyuan Ma, Kaiyan Zhang, Ning Ding, and Bowen Zhou. Self-reflective reinforcement learning for diffusion-based image reasoning generation.arXiv preprint arXiv:2505.22407, 2025. 2

  24. [24]

    Yirou Pan, Sophia Bano, Francisco Vasconcelos, Hyun Park, Taikyeong Ted Jeong, and Danail Stoyanov. Desmoke-lap: improved unpaired image-to-image translation for desmok- ing in laparoscopic surgery.International Journal of Com- puter Assisted Radiology and Surgery, 17(5):885–893, 2022. 2, 6, 7, 8, 1

  25. [25]

    Desmok- ing laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel.IEEE Access, 8:208898–208909, 2020

    Sebasti ´an Salazar-Colores, Hugo Moreno Jim ´enez, C´esar Javier Ortiz-Echeverri, and Gerardo Flores. Desmok- ing laparoscopy surgery images using an image-to-image translation guided by an embedded dark channel.IEEE Access, 8:208898–208909, 2020. 2, 8

  26. [26]

    Improved techniques for training gans.Advances in neural information processing systems, 29, 2016

    Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans.Advances in neural information processing systems, 29, 2016. 6

  27. [27]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024.URL https://arxiv. org/abs/2402.03300, 2(3):5, 2024. 1, 2, 3

  28. [28]

    Amncutter: Affinity-attention-guided multi- view normalized cutter for unsupervised surgical instrument segmentation

    Mingyu Sheng, Jianan Fan, Dongnan Liu, Ron Kikinis, and Weidong Cai. Amncutter: Affinity-attention-guided multi- view normalized cutter for unsupervised surgical instrument segmentation. In2025 IEEE/CVF Winter Conference on Ap- plications of Computer Vision (WACV), pages 4533–4544. IEEE, 2025. 8

  29. [29]

    Generative smoke removal

    Oleksii Sidorov, Congcong Wang, and Faouzi Alaya Cheikh. Generative smoke removal. InMachine Learning for Health Workshop, pages 81–92. PMLR, 2020. 8

  30. [30]

    Score-Based Generative Modeling through Stochastic Differential Equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions.arXiv preprint arXiv:2011.13456, 2020. 2

  31. [31]

    Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023

    Yuda Song, Zhuqing He, Hui Qian, and Xin Du. Vision transformers for single image dehazing.IEEE Transactions on Image Processing, 32:1927–1941, 2023. 7, 8

  32. [32]

    Multi-stages de-smoking model based on cyclegan for surgical de-smoking.International Journal of Machine Learning and Cybernetics, 14(11): 3965–3978, 2023

    Xinpei Su and Qiuxia Wu. Multi-stages de-smoking model based on cyclegan for surgical de-smoking.International Journal of Machine Learning and Cybernetics, 14(11): 3965–3978, 2023. 8

  33. [33]

    Unsupervised smoke to desmoked la- paroscopic surgery images using contrast driven cyclic- desmokegan.Computers in Biology and Medicine, 123: 103873, 2020

    Vishal Venkatesh, Neeraj Sharma, Vivek Srivastava, and Munendra Singh. Unsupervised smoke to desmoked la- paroscopic surgery images using contrast driven cyclic- desmokegan.Computers in Biology and Medicine, 123: 103873, 2020. 2

  34. [34]

    Variational based smoke removal in laparoscopic images.Biomedical engi- neering online, 17(1):139, 2018

    Congcong Wang, Faouzi Alaya Cheikh, Mounir Kaaniche, Azeddine Beghdadi, and Ole Jacob Elle. Variational based smoke removal in laparoscopic images.Biomedical engi- neering online, 17(1):139, 2018. 8

  35. [35]

    Surgical smoke re- moval via residual swin transformer network.International Journal of Computer Assisted Radiology and Surgery, 18(8): 1417–1427, 2023

    Feng Wang, Xinan Sun, and Jinhua Li. Surgical smoke re- moval via residual swin transformer network.International Journal of Computer Assisted Radiology and Surgery, 18(8): 1417–1427, 2023. 2

  36. [36]

    Simplear: Pushing the frontier of autoregressive visual generation through pretraining, sft, and rl

    Junke Wang, Zhi Tian, Xun Wang, Xinyu Zhang, Weilin Huang, Zuxuan Wu, and Yu-Gang Jiang. Simplear: Pushing the frontier of autoregressive visual generation through pre- training, sft, and rl.arXiv preprint arXiv:2504.11455, 2025. 2

  37. [37]

    Self-supervised video desmoking for laparoscopic surgery

    Renlong Wu, Zhilu Zhang, Shuohao Zhang, Longfei Gou, Haobin Chen, Lei Zhang, Hao Chen, and Wangmeng Zuo. Self-supervised video desmoking for laparoscopic surgery. InEuropean Conference on Computer Vision, pages 307–

  38. [38]

    Springer, 2024. 6, 7, 1

  39. [39]

    A new benchmark in vivo paired dataset for laparoscopic im- age de-smoking, 2024

    Wenyao Xia, Victoria Fan, Terry Peters, and Elvis CS Chen. A new benchmark in vivo paired dataset for laparoscopic im- age de-smoking, 2024. 4, 6

  40. [40]

    Td-sam: Temporal and distance-guided adaptations of sam for accurate surgical in- strument segmentation.IEEE Journal of Biomedical and Health Informatics, 2025

    Cheng Xue, Shiyu Zhao, Danqiong Wang, Cheng Chen, Guanyu Yang, and Yang Chen. Td-sam: Temporal and distance-guided adaptations of sam for accurate surgical in- strument segmentation.IEEE Journal of Biomedical and Health Informatics, 2025. 8

  41. [41]

    No-Reference Quality Assessment of Contrast-Distorted Images using Contrast Enhancement

    Jia Yan, Jie Li, and Xin Fu. No-reference quality assess- ment of contrast-distorted images using contrast enhance- ment.arXiv preprint arXiv:1904.08879, 2019. 5

  42. [42]

    Maniqa: Multi-dimension attention network for no-reference image quality assessment

    Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, and Yujiu Yang. Maniqa: Multi-dimension attention network for no-reference image quality assessment. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1191–1200, 2022. 6

  43. [43]

    Seg-r1: Segmentation can be surprisingly simple with reinforcement 33 ConceptSeg-R1 learning

    Zuyao You and Zuxuan Wu. Seg-r1: Segmentation can be surprisingly simple with reinforcement learning.arXiv preprint arXiv:2506.22624, 2025. 2

  44. [44]

    Progressive frequency-aware network for laparo- scopic image desmoking

    Jiale Zhang, Wenfeng Huang, Xiangyun Liao, and Qiong Wang. Progressive frequency-aware network for laparo- scopic image desmoking. InChinese Conference on Pattern Recognition and Computer Vision (PRCV), pages 479–492. Springer, 2023. 6, 7, 1

  45. [45]

    Blind image quality assessment via vision- language correspondence: A multitask learning perspective

    Weixia Zhang, Guangtao Zhai, Ying Wei, Xiaokang Yang, and Kede Ma. Blind image quality assessment via vision- language correspondence: A multitask learning perspective. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 14071–14081, 2023. 5

  46. [46]

    Ef- ficient dual-domain image dehazing with haze prior percep- tion.arXiv preprint arXiv:2507.11035, 2025

    Lirong Zheng, Yanshan Li, Rui Yu, and Kaihao Zhang. Ef- ficient dual-domain image dehazing with haze prior percep- tion.arXiv preprint arXiv:2507.11035, 2025. 6, 7, 1

  47. [47]

    Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.In- ternational Journal of Computer Vision, 130(9):2337–2348,

  48. [48]

    Yichao Zhou, Zhisen Hu, Zuxing Xuan, Yangang Wang, and Xiyuan Hu. Synchronizing detection and removal of smoke in endoscopic images with cyclic consistency adver- sarial nets.IEEE/ACM Transactions on Computational Biol- ogy and Bioinformatics, 21(4):670–680, 2022. 2 PhySe-RPO: Physics and Semantics Guided Relative Policy Optimization for Diffusion-Based S...

  49. [49]

    Results on Synthetic Datasets To verify that our model has basic desmoking capabilities at cold start, we tested it on a synthetic dataset

    Additional Experimental Results 6.1. Results on Synthetic Datasets To verify that our model has basic desmoking capabilities at cold start, we tested it on a synthetic dataset. The exper- imental results are shown in Table 6,our method achieves the best overall performance across all four evaluation met- rics, demonstrating strong desmoking ability even a...