Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

Chao Huang; Junyu Dong; Shixuan Xu; Xinghui Dong; Yabo Liu

arxiv: 2603.07076 · v2 · submitted 2026-03-07 · 💻 cs.CV

Retinex Meets Language: A Physics-Semantics-Guided Underwater Image Enhancement Network

Shixuan Xu , Yabo Liu , Chao Huang , Junyu Dong , Xinghui Dong This is my paper

Pith reviewed 2026-05-15 14:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords underwater image enhancementRetinex modelCLIP guidancemultimodal datasetsemantic consistency lossimage restorationphysics-informed learning

0 comments

The pith

Coupling Retinex illumination correction with CLIP language guidance enhances underwater images more effectively.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a network that uses Retinex to estimate illumination without priors and incorporates semantic guidance from language models to restore underwater images. This tackles the problems of rigid assumptions in physical models and insufficient data in learning approaches by building a new multimodal dataset of image and text pairs. The method optimizes for both physical correction and semantic consistency, leading to better results than existing techniques on several datasets. Readers might care because improved underwater visibility supports applications in oceanography and underwater vehicles. The core idea is that language provides high-level cues that physics alone cannot supply in degraded scenes.

Core claim

The paper establishes that a Physics-Semantics-Guided Underwater Image Enhancement Network, which combines a Prior-Free Illumination Estimator grounded in Retinex with a Semantics-Guided Image Restorer that uses CLIP-generated textual descriptions, can achieve superior enhancement by enforcing semantic consistency through a dedicated loss on a newly constructed dataset of 6,418 image-reference-text triplets.

What carries the argument

The Semantics-Guided Image Restorer that leverages CLIP textual descriptions to inject high-level semantics for guiding the restoration process.

If this is right

Enhanced correction of color distortion and low contrast without relying on strict physical assumptions.
Improved generalization to varied underwater conditions through multimodal training.
New capability to measure and optimize semantic alignment between images and text in enhancement tasks.
Comparable or better performance against fifteen existing methods on public datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar integration of physical models with language guidance could be tested on other image degradation problems like haze or night scenes.
The constructed dataset may serve as a benchmark for future multimodal underwater vision research.
Using different vision-language models beyond CLIP might yield further gains if the semantic guidance is the key factor.

Load-bearing premise

That the textual descriptions produced by CLIP provide accurate and useful perceptual guidance for restoring details in underwater images.

What would settle it

Running the enhancement on a set of underwater images paired with deliberately incorrect or mismatched textual descriptions and checking if performance drops significantly compared to correct descriptions.

Figures

Figures reproduced from arXiv: 2603.07076 by Chao Huang, Junyu Dong, Shixuan Xu, Xinghui Dong, Yabo Liu.

**Figure 2.** Figure 2: Sixteen degraded-reference-text triplets contained in the LUIQD-TD. Each triplet consists of three components: a degraded image (top-left) and the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Statistical analysis of the textual annotations in the LUIQD-TD, including (a) the distribution of word frequencies, (b) the distribution of caption [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: The architecture of our PSG-UIENet, which comprises two modules: (a) a Prior-Free Illumination Estimator that generates multi-scale light-enhanced [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The architecture of the Semantics-Guided Encoder-Decoder Network. This network is built on top of a symmetric encoder-decoder network, which [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: The results produced by 15 baselines and our method in terms of a degraded image in the Test-L622 test set. Here, the PSNR, SSIM and LPIPS [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: The results produced by 15 baselines and our method in terms of a degraded image in the Test-U80 test set. Here, the PSNR, SSIM and LPIPS [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: The results produced by 15 baselines and our method in terms of a degraded image in the Test-S110 test set. Here, the PSNR, SSIM and LPIPS [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 9.** Figure 9: The results produced by 15 baselines and our method in terms of a degraded image in the Test-C60 test set. Here, both the PAUQA and UIF values [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 10.** Figure 10: The results produced by 15 baselines and our method in terms of a degraded image in the Test-R53 test set. Here, both the PAUQA and UIF values [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

read the original abstract

Underwater images often suffer from severe degradation caused by light absorption and scattering, leading to color distortion, low contrast and reduced visibility. Existing Underwater Image Enhancement (UIE) methods can be divided into two categories, i.e., prior-based and learning-based methods. The former rely on rigid physical assumptions that limit the adaptability, while the latter often face data scarcity and weak generalization. To address these issues, we propose a Physics-Semantics-Guided Underwater Image Enhancement Network (PSG-UIENet), which couples the Retinex-grounded illumination correction with the language-informed guidance. This network comprises a Prior-Free Illumination Estimator and a Semantics-Guided Image Restorer. In particular, the restorer leverages the textual descriptions generated by the Contrastive Language-Image Pre-training (CLIP) model to inject high-level semantics for perceptually meaningful guidance. Since multimodal UIE data sets are not publicly available, we also construct a large-scale image-text UIE data set, namely, LUIQD-TD, which contains 6,418 image-reference-text triplets. To explicitly measure and optimize semantic consistency between textual descriptions and images, we further design an Image-Text Semantic Similarity (ITSS) loss function. To our knowledge, this study makes the first effort to introduce both textual guidance and the multimodal data set into UIE tasks. Extensive experiments on our data set and four publicly available data sets demonstrate that the proposed PSG-UIENet achieves superior or comparable performance against fifteen state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs a Retinex illumination estimator with CLIP text guidance and releases a new multimodal underwater dataset, but the gains may rest more on training details than on the language component actually helping.

read the letter

The main thing here is the attempt to inject high-level semantics from CLIP into a Retinex pipeline for underwater enhancement, along with the new LUIQD-TD dataset of 6,418 image-text pairs and the ITSS loss to enforce consistency. That combination is presented as the first of its kind in this niche, and the architecture description is clear enough on how the Prior-Free Illumination Estimator and Semantics-Guided Image Restorer are meant to work together. The motivation around data scarcity and weak generalization in existing UIE methods is straightforward and reasonable. Experiments are claimed on the new set plus four public ones, beating or matching fifteen prior methods, which at least shows they ran a broad comparison. The dataset release itself is a concrete positive if the triplets are well annotated and diverse. The soft spot is the CLIP part. CLIP was trained on clear terrestrial scenes, so its embeddings and generated captions for heavily degraded underwater inputs could easily be noisy or generic rather than perceptually useful. Without seeing the ablations that isolate the ITSS loss or the language branch, it is hard to tell whether the reported improvements come from the semantics guidance or from other factors like network capacity and training choices. The abstract gives no quantitative tables or error breakdowns, so the central performance claim stays hard to evaluate for robustness. This is aimed at researchers focused on underwater vision or physics-informed multimodal restoration. A reader in that area would find the dataset and the loss design worth looking at, even if the language benefit needs more proof. I would send it to peer review so the numbers and the CLIP relevance can be checked directly.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes PSG-UIENet for underwater image enhancement, coupling a Prior-Free Illumination Estimator grounded in Retinex theory with a Semantics-Guided Image Restorer that injects high-level semantics via CLIP-generated textual descriptions. The authors introduce the LUIQD-TD multimodal dataset (6,418 image-reference-text triplets) and an ITSS loss to enforce image-text semantic consistency, claiming this is the first integration of textual guidance and multimodal data into UIE tasks. Extensive experiments reportedly show superior or comparable results to 15 state-of-the-art methods on the new dataset plus four public benchmarks.

Significance. If the performance gains prove robust and the semantic component demonstrably contributes beyond standard Retinex or CNN baselines, the work would advance UIE by bridging physical priors with multimodal language models, directly addressing data scarcity and generalization limitations. The LUIQD-TD dataset is a concrete community resource, and the explicit ITSS loss provides a falsifiable mechanism for semantic alignment; these elements strengthen the contribution even if the absolute gains are incremental.

major comments (2)

[§3.2] §3.2 (Semantics-Guided Image Restorer): the reliance on CLIP-generated captions for perceptually meaningful guidance is central to the novelty claim, yet the manuscript provides no quantitative validation (e.g., caption accuracy metrics or human study) that CLIP embeddings remain informative under severe underwater color-cast and low-contrast conditions; without this, the ITSS loss may optimize to generic or domain-shifted text rather than task-relevant semantics.
[§4] §4 (Experiments): the central performance claim against 15 SOTA methods is load-bearing, but the manuscript must include full quantitative tables (PSNR, SSIM, UIQM, etc.) on all five datasets, ablation studies isolating the ITSS loss and CLIP component, and error analysis or failure cases; the abstract alone supplies no such evidence, preventing verification that gains are not due to post-hoc dataset choices or missing baselines.

minor comments (2)

Notation for the ITSS loss and the precise formulation of how CLIP text embeddings are fused into the restorer should be clarified with an explicit equation or diagram to avoid ambiguity in reproduction.
The manuscript should add a brief related-work subsection contrasting the proposed Retinex+CLIP approach with prior multimodal or language-guided enhancement methods outside UIE to strengthen the novelty positioning.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and will revise the manuscript to incorporate additional validation and expanded experimental details where needed.

read point-by-point responses

Referee: [§3.2] §3.2 (Semantics-Guided Image Restorer): the reliance on CLIP-generated captions for perceptually meaningful guidance is central to the novelty claim, yet the manuscript provides no quantitative validation (e.g., caption accuracy metrics or human study) that CLIP embeddings remain informative under severe underwater color-cast and low-contrast conditions; without this, the ITSS loss may optimize to generic or domain-shifted text rather than task-relevant semantics.

Authors: We agree that direct quantitative validation of CLIP caption quality specifically on underwater images would strengthen the central novelty claim. The current manuscript relies on end-to-end performance gains and the ITSS loss to demonstrate utility, but we will add a new subsection with caption accuracy metrics (e.g., CLIP similarity scores between generated texts and reference descriptions) computed on a held-out subset of LUIQD-TD, along with qualitative examples of captions under varying degradation levels. This revision will directly address whether the embeddings remain informative. revision: yes
Referee: [§4] §4 (Experiments): the central performance claim against 15 SOTA methods is load-bearing, but the manuscript must include full quantitative tables (PSNR, SSIM, UIQM, etc.) on all five datasets, ablation studies isolating the ITSS loss and CLIP component, and error analysis or failure cases; the abstract alone supplies no such evidence, preventing verification that gains are not due to post-hoc dataset choices or missing baselines.

Authors: We acknowledge that the experimental section requires fuller documentation to support the claims. In the revised manuscript we will expand Section 4 with complete tables reporting PSNR, SSIM, UIQM and additional metrics across all five datasets (LUIQD-TD plus the four public benchmarks), dedicated ablation tables isolating the ITSS loss and CLIP semantic injection, and a new error-analysis subsection that discusses representative failure cases. These additions will allow independent verification that observed gains arise from the proposed components. revision: yes

Circularity Check

0 steps flagged

No derivation reduces to fitted input by construction; ITSS loss and CLIP guidance remain external

full rationale

The paper introduces a Prior-Free Illumination Estimator grounded in Retinex and a Semantics-Guided Image Restorer that injects CLIP-generated text via a newly defined ITSS loss. Neither component is shown to be fitted to a subset of the target outputs and then re-predicted; the multimodal dataset LUIQD-TD is constructed externally rather than derived from model parameters. Standard Retinex decomposition is invoked without self-referential redefinition of illumination or reflectance terms. Any self-citations are peripheral and do not carry the central claim of first multimodal UIE integration. The reported performance gains are therefore not forced by internal re-labeling of fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

Central claim rests on Retinex as a flexible illumination model, CLIP providing useful semantics for underwater scenes, and the new dataset being large enough to train without overfitting; no free parameters are named in the abstract.

axioms (2)

domain assumption Retinex model can be used for illumination correction without rigid physical assumptions
Basis for the Prior-Free Illumination Estimator component
domain assumption CLIP text embeddings supply perceptually meaningful guidance for image restoration
Core premise of the Semantics-Guided Image Restorer

invented entities (2)

LUIQD-TD dataset no independent evidence
purpose: Provide large-scale image-reference-text triplets for multimodal UIE training
Newly constructed collection of 6,418 triplets; no external validation mentioned
ITSS loss no independent evidence
purpose: Explicitly optimize semantic consistency between images and text descriptions
Newly designed loss function; details of formulation absent from abstract

pith-pipeline@v0.9.0 · 5589 in / 1425 out tokens · 48675 ms · 2026-05-15T14:52:03.658370+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Retinex theory decomposes an image into reflectance and illumination... Ideg = (R + R̂) ⊙ (L + L̂)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 1 internal anchor

[1]

A perception- aware decomposition and fusion framework for underwater image enhancement,

Y . Kang, Q. Jiang, C. Li, W. Ren, H. Liu, and P. Wang, “A perception- aware decomposition and fusion framework for underwater image enhancement,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 988–1002, 2022

work page 2022
[2]

Beyond single reference for training: Underwater image enhancement via comparative learning,

K. Li, L. Wu, Q. Qi, W. Liu, X. Gao, L. Zhou, and D. Song, “Beyond single reference for training: Underwater image enhancement via comparative learning,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 6, pp. 2561–2576, 2022

work page 2022
[3]

Underwater image enhance- ment via histogram similarity-oriented color compensation comple- mented by multiple attribute adjustment,

H. Wang, A. C. Frery, M. Li, and P. Ren, “Underwater image enhance- ment via histogram similarity-oriented color compensation comple- mented by multiple attribute adjustment,”Intelligent Marine Technology and Systems, vol. 1, no. 1, p. 12, 2023

work page 2023
[4]

Image enhancement in turbid water using multiscale weighted features and attention mechanisms,

H. Zhang, W. Zhang, H. Yuan, S. Bai, Y . Tian, and Z. Liu, “Image enhancement in turbid water using multiscale weighted features and attention mechanisms,”Intelligent Marine Technology and Systems, vol. 3, no. 1, p. 35, 2025

work page 2025
[5]

Retinex- former: One-stage retinex-based transformer for low-light image en- hancement,

Y . Cai, H. Bian, J. Lin, H. Wang, R. Timofte, and Y . Zhang, “Retinex- former: One-stage retinex-based transformer for low-light image en- hancement,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 12 504–12 513. 13

work page 2023
[6]

Retinexmamba: Retinex- based mamba for low-light image enhancement,

J. Bai, Y . Yin, Q. He, Y . Li, and X. Zhang, “Retinexmamba: Retinex- based mamba for low-light image enhancement,” inInternational Con- ference on Neural Information Processing. Springer, 2025, pp. 427– 442

work page 2025
[7]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004

work page 2004
[8]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

work page 2018
[9]

Lightness and retinex theory,

E. H. Land and J. J. McCann, “Lightness and retinex theory,”Journal of the Optical society of America, vol. 61, no. 1, pp. 1–11, 1971

work page 1971
[10]

Single image haze removal using dark channel prior,

K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2341–2353, 2010

work page 2010
[11]

Transmission estimation in underwater single images,

P. Drews, E. Nascimento, F. Moraes, S. Botelho, and M. Campos, “Transmission estimation in underwater single images,” inProceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 825–830

work page 2013
[12]

Generalization of the dark channel prior for single image restoration,

Y .-T. Peng, K. Cao, and P. C. Cosman, “Generalization of the dark channel prior for single image restoration,”IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2856–2868, 2018

work page 2018
[13]

Diving into haze-lines: Color restoration of underwater images,

D. Berman, T. Treibitz, and S. Avidan, “Diving into haze-lines: Color restoration of underwater images,” inProc. British Machine Vision Conference (BMVC), vol. 1, no. 2, 2017

work page 2017
[14]

The retinex theory of color vision

E. H. Land, “The retinex theory of color vision.”Scientific American, p. 108–128, Feb 2010. [Online]. Available: http://dx.doi.org/10.1038/ scientificamerican1277-108

work page 2010
[15]

Underwater image enhancement with hyper-laplacian reflectance priors,

P. Zhuang, J. Wu, F. Porikli, and C. Li, “Underwater image enhancement with hyper-laplacian reflectance priors,”IEEE Transactions on Image Processing, vol. 31, pp. 5442–5455, 2022

work page 2022
[16]

Under- water image enhancement via minimal color loss and locally adaptive contrast enhancement,

W. Zhang, P. Zhuang, H.-H. Sun, G. Li, S. Kwong, and C. Li, “Under- water image enhancement via minimal color loss and locally adaptive contrast enhancement,”IEEE Transactions on Image Processing, vol. 31, pp. 3997–4010, 2022

work page 2022
[17]

Hfm: A hybrid fusion method for underwater image enhancement,

S. An, L. Xu, I. Senior Member, Z. Deng, and H. Zhang, “Hfm: A hybrid fusion method for underwater image enhancement,”Engineering Applications of Artificial Intelligence, vol. 127, p. 107219, 2024

work page 2024
[18]

An underwater image enhancement benchmark dataset and beyond,

C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,”IEEE Transactions on Image Processing, vol. 29, pp. 4376–4389, 2019

work page 2019
[19]

Underwater image enhancement via medium transmission-guided multi-color space embedding,

C. Li, S. Anwar, J. Hou, R. Cong, C. Guo, and W. Ren, “Underwater image enhancement via medium transmission-guided multi-color space embedding,”IEEE Transactions on Image Processing, vol. 30, pp. 4985– 5000, 2021

work page 2021
[20]

A semi-supervised physics-aware triple-stream underwater image enhancement network,

S. Xu, H. Qi, W. Wang, C. Huang, J. Wen, J. Dong, and X. Dong, “A semi-supervised physics-aware triple-stream underwater image enhancement network,” 2025. [Online]. Available: https: //arxiv.org/abs/2307.11470

work page arXiv 2025
[21]

Uncertainty inspired underwater image enhancement,

Z. Fu, W. Wang, Y . Huang, X. Ding, and K.-K. Ma, “Uncertainty inspired underwater image enhancement,” inEuropean conference on computer vision. Springer, 2022, pp. 465–482

work page 2022
[22]

U-shape transformer for underwater image enhancement,

L. Peng, C. Zhu, and L. Bian, “U-shape transformer for underwater image enhancement,”IEEE Transactions on Image Processing, vol. 32, pp. 3066–3079, 2023

work page 2023
[23]

Underwater ranker: Learn which is better and how to be better,

C. Guo, R. Wu, X. Jin, L. Han, W. Zhang, Z. Chai, and C. Li, “Underwater ranker: Learn which is better and how to be better,” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 1, 2023, pp. 702–709

work page 2023
[24]

Deep color-corrected multi-scale retinex network for underwater image enhancement,

H. Qi, H. Zhou, J. Dong, and X. Dong, “Deep color-corrected multi-scale retinex network for underwater image enhancement,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024

work page 2024
[25]

Uwformer: Underwater image enhancement via a semi-supervised multi-scale trans- former,

W. Chen, Y . Lei, S. Luo, Z. Zhou, M. Li, and C.-M. Pun, “Uwformer: Underwater image enhancement via a semi-supervised multi-scale trans- former,” in2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024, pp. 1–8

work page 2024
[26]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[27]

Iterative prompt learning for unsupervised backlit image enhancement,

Z. Liang, C. Li, S. Zhou, R. Feng, and C. C. Loy, “Iterative prompt learning for unsupervised backlit image enhancement,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8094–8103

work page 2023
[28]

Hazeclip: Towards language guided real-world image dehazing,

R. Wang, W. Li, X. Liu, C. Li, Z. Zhang, X. Min, and G. Zhai, “Hazeclip: Towards language guided real-world image dehazing,” inICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

work page 2025
[29]

Underwater image enhancement by diffusion model with customized clip-classifier,

S. Liu, K. Li, Y . Ding, and Q. Qi, “Underwater image enhancement by diffusion model with customized clip-classifier,”arXiv preprint arXiv:2405.16214, 2024

work page arXiv 2024
[30]

Perception-aware underwater image quality assessment: Dataset, perceptual quality scores and assessment network,

B. Lin, J. Dong, and X. Dong, “Perception-aware underwater image quality assessment: Dataset, perceptual quality scores and assessment network,”IEEE Transactions on Circuits and Systems for Video Tech- nology, 2025

work page 2025
[31]

Underwater single image color restoration using haze-lines and a new quantitative dataset,

D. Berman, D. Levy, S. Avidan, and T. Treibitz, “Underwater single image color restoration using haze-lines and a new quantitative dataset,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 8, pp. 2822–2837, 2020

work page 2020
[32]

Zuiderveld,Contrast Limited Adaptive Histogram Equalization

K. Zuiderveld,Contrast Limited Adaptive Histogram Equalization. USA: Academic Press Professional, Inc., 1994, p. 474–485

work page 1994
[33]

Shallow- water image enhancement using relative global histogram stretching based on adaptive parameter acquisition,

D. Huang, Y . Wang, W. Song, J. Sequeira, and S. Mavromatis, “Shallow- water image enhancement using relative global histogram stretching based on adaptive parameter acquisition,” inMultiMedia Modeling: 24th International Conference, MMM 2018, Bangkok, Thailand, February 5- 7, 2018, Proceedings, Part I 24. Springer, 2018, pp. 453–465

work page 2018
[34]

Underwater image enhance- ment via extended multi-scale retinex,

S. Zhang, T. Wang, J. Dong, and H. Yu, “Underwater image enhance- ment via extended multi-scale retinex,”Neurocomputing, vol. 245, pp. 1–9, 2017

work page 2017
[35]

Enhancing underwa- ter images and videos by fusion,

C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwa- ter images and videos by fusion,” in2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 81–88

work page 2012
[36]

Color balance and fusion for underwater image enhancement,

C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert, “Color balance and fusion for underwater image enhancement,”IEEE Transac- tions on Image Processing, vol. 27, no. 1, pp. 379–393, 2017

work page 2017
[37]

Sguie-net: Semantic attention guided underwater image enhancement with multi- scale perception,

Q. Qi, K. Li, H. Zheng, X. Gao, G. Hou, and K. Sun, “Sguie-net: Semantic attention guided underwater image enhancement with multi- scale perception,”IEEE Transactions on Image Processing, vol. 31, pp. 6816–6830, 2022

work page 2022
[38]

Rave: Residual vector embedding for clip-guided backlit image enhancement,

T. Gaintseva, M. Benning, and G. Slabaugh, “Rave: Residual vector embedding for clip-guided backlit image enhancement,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 412–428

work page 2024
[39]

Vqa: Visual question answering,

S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “Vqa: Visual question answering,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 2425–2433

work page 2015
[40]

Masked au- toencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

work page 2022
[41]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018
[42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[43]

Image enhancement by histogram transformation,

R. Hummel, “Image enhancement by histogram transformation,”Com- puter Graphics Image Processing, vol. 6, no. 2, pp. 184–195, 1977

work page 1977
[44]

Uif: An objective quality assessment for underwater image enhancement,

Y . Zheng, W. Chen, R. Lin, T. Zhao, and P. Le Callet, “Uif: An objective quality assessment for underwater image enhancement,”IEEE Transactions on Image Processing, vol. 31, pp. 5456–5468, 2022

work page 2022
[45]

An underwater color image quality evaluation metric,

M. Yang and A. Sowmya, “An underwater color image quality evaluation metric,”IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 6062–6071, 2015

work page 2015
[46]

Human-visual-system-inspired underwater image quality measures,

K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,”IEEE Journal of Oceanic Engi- neering, vol. 41, no. 3, pp. 541–551, 2015. Shixuan Xureceived the bachelor’s degree in En- gineering from Lanzhou University of Finance and Economics (LZUFE), Lanzhou, Gansu, China, in

work page 2015
[47]

His research interests include computer vision, deep learning and image enhancement

He is currently pursuing the master’s degree in Artificial Intelligence at Ocean University of China. His research interests include computer vision, deep learning and image enhancement. 14 Yabo Liureceived the Ph.D. degree in computer technology from Harbin Institute of Technology, Shenzhen, China, in 2025. From 2021 to 2025, he was a jointly supervised ...

work page 2025

[1] [1]

A perception- aware decomposition and fusion framework for underwater image enhancement,

Y . Kang, Q. Jiang, C. Li, W. Ren, H. Liu, and P. Wang, “A perception- aware decomposition and fusion framework for underwater image enhancement,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 3, pp. 988–1002, 2022

work page 2022

[2] [2]

Beyond single reference for training: Underwater image enhancement via comparative learning,

K. Li, L. Wu, Q. Qi, W. Liu, X. Gao, L. Zhou, and D. Song, “Beyond single reference for training: Underwater image enhancement via comparative learning,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 6, pp. 2561–2576, 2022

work page 2022

[3] [3]

Underwater image enhance- ment via histogram similarity-oriented color compensation comple- mented by multiple attribute adjustment,

H. Wang, A. C. Frery, M. Li, and P. Ren, “Underwater image enhance- ment via histogram similarity-oriented color compensation comple- mented by multiple attribute adjustment,”Intelligent Marine Technology and Systems, vol. 1, no. 1, p. 12, 2023

work page 2023

[4] [4]

Image enhancement in turbid water using multiscale weighted features and attention mechanisms,

H. Zhang, W. Zhang, H. Yuan, S. Bai, Y . Tian, and Z. Liu, “Image enhancement in turbid water using multiscale weighted features and attention mechanisms,”Intelligent Marine Technology and Systems, vol. 3, no. 1, p. 35, 2025

work page 2025

[5] [5]

Retinex- former: One-stage retinex-based transformer for low-light image en- hancement,

Y . Cai, H. Bian, J. Lin, H. Wang, R. Timofte, and Y . Zhang, “Retinex- former: One-stage retinex-based transformer for low-light image en- hancement,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 12 504–12 513. 13

work page 2023

[6] [6]

Retinexmamba: Retinex- based mamba for low-light image enhancement,

J. Bai, Y . Yin, Q. He, Y . Li, and X. Zhang, “Retinexmamba: Retinex- based mamba for low-light image enhancement,” inInternational Con- ference on Neural Information Processing. Springer, 2025, pp. 427– 442

work page 2025

[7] [7]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004

work page 2004

[8] [8]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595

work page 2018

[9] [9]

Lightness and retinex theory,

E. H. Land and J. J. McCann, “Lightness and retinex theory,”Journal of the Optical society of America, vol. 61, no. 1, pp. 1–11, 1971

work page 1971

[10] [10]

Single image haze removal using dark channel prior,

K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 2341–2353, 2010

work page 2010

[11] [11]

Transmission estimation in underwater single images,

P. Drews, E. Nascimento, F. Moraes, S. Botelho, and M. Campos, “Transmission estimation in underwater single images,” inProceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 825–830

work page 2013

[12] [12]

Generalization of the dark channel prior for single image restoration,

Y .-T. Peng, K. Cao, and P. C. Cosman, “Generalization of the dark channel prior for single image restoration,”IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 2856–2868, 2018

work page 2018

[13] [13]

Diving into haze-lines: Color restoration of underwater images,

D. Berman, T. Treibitz, and S. Avidan, “Diving into haze-lines: Color restoration of underwater images,” inProc. British Machine Vision Conference (BMVC), vol. 1, no. 2, 2017

work page 2017

[14] [14]

The retinex theory of color vision

E. H. Land, “The retinex theory of color vision.”Scientific American, p. 108–128, Feb 2010. [Online]. Available: http://dx.doi.org/10.1038/ scientificamerican1277-108

work page 2010

[15] [15]

Underwater image enhancement with hyper-laplacian reflectance priors,

P. Zhuang, J. Wu, F. Porikli, and C. Li, “Underwater image enhancement with hyper-laplacian reflectance priors,”IEEE Transactions on Image Processing, vol. 31, pp. 5442–5455, 2022

work page 2022

[16] [16]

Under- water image enhancement via minimal color loss and locally adaptive contrast enhancement,

W. Zhang, P. Zhuang, H.-H. Sun, G. Li, S. Kwong, and C. Li, “Under- water image enhancement via minimal color loss and locally adaptive contrast enhancement,”IEEE Transactions on Image Processing, vol. 31, pp. 3997–4010, 2022

work page 2022

[17] [17]

Hfm: A hybrid fusion method for underwater image enhancement,

S. An, L. Xu, I. Senior Member, Z. Deng, and H. Zhang, “Hfm: A hybrid fusion method for underwater image enhancement,”Engineering Applications of Artificial Intelligence, vol. 127, p. 107219, 2024

work page 2024

[18] [18]

An underwater image enhancement benchmark dataset and beyond,

C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,”IEEE Transactions on Image Processing, vol. 29, pp. 4376–4389, 2019

work page 2019

[19] [19]

Underwater image enhancement via medium transmission-guided multi-color space embedding,

C. Li, S. Anwar, J. Hou, R. Cong, C. Guo, and W. Ren, “Underwater image enhancement via medium transmission-guided multi-color space embedding,”IEEE Transactions on Image Processing, vol. 30, pp. 4985– 5000, 2021

work page 2021

[20] [20]

A semi-supervised physics-aware triple-stream underwater image enhancement network,

S. Xu, H. Qi, W. Wang, C. Huang, J. Wen, J. Dong, and X. Dong, “A semi-supervised physics-aware triple-stream underwater image enhancement network,” 2025. [Online]. Available: https: //arxiv.org/abs/2307.11470

work page arXiv 2025

[21] [21]

Uncertainty inspired underwater image enhancement,

Z. Fu, W. Wang, Y . Huang, X. Ding, and K.-K. Ma, “Uncertainty inspired underwater image enhancement,” inEuropean conference on computer vision. Springer, 2022, pp. 465–482

work page 2022

[22] [22]

U-shape transformer for underwater image enhancement,

L. Peng, C. Zhu, and L. Bian, “U-shape transformer for underwater image enhancement,”IEEE Transactions on Image Processing, vol. 32, pp. 3066–3079, 2023

work page 2023

[23] [23]

Underwater ranker: Learn which is better and how to be better,

C. Guo, R. Wu, X. Jin, L. Han, W. Zhang, Z. Chai, and C. Li, “Underwater ranker: Learn which is better and how to be better,” in Proceedings of the AAAI conference on artificial intelligence, vol. 37, no. 1, 2023, pp. 702–709

work page 2023

[24] [24]

Deep color-corrected multi-scale retinex network for underwater image enhancement,

H. Qi, H. Zhou, J. Dong, and X. Dong, “Deep color-corrected multi-scale retinex network for underwater image enhancement,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–13, 2024

work page 2024

[25] [25]

Uwformer: Underwater image enhancement via a semi-supervised multi-scale trans- former,

W. Chen, Y . Lei, S. Luo, Z. Zhou, M. Li, and C.-M. Pun, “Uwformer: Underwater image enhancement via a semi-supervised multi-scale trans- former,” in2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 2024, pp. 1–8

work page 2024

[26] [26]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021

[27] [27]

Iterative prompt learning for unsupervised backlit image enhancement,

Z. Liang, C. Li, S. Zhou, R. Feng, and C. C. Loy, “Iterative prompt learning for unsupervised backlit image enhancement,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 8094–8103

work page 2023

[28] [28]

Hazeclip: Towards language guided real-world image dehazing,

R. Wang, W. Li, X. Liu, C. Li, Z. Zhang, X. Min, and G. Zhai, “Hazeclip: Towards language guided real-world image dehazing,” inICASSP 2025- 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

work page 2025

[29] [29]

Underwater image enhancement by diffusion model with customized clip-classifier,

S. Liu, K. Li, Y . Ding, and Q. Qi, “Underwater image enhancement by diffusion model with customized clip-classifier,”arXiv preprint arXiv:2405.16214, 2024

work page arXiv 2024

[30] [30]

Perception-aware underwater image quality assessment: Dataset, perceptual quality scores and assessment network,

B. Lin, J. Dong, and X. Dong, “Perception-aware underwater image quality assessment: Dataset, perceptual quality scores and assessment network,”IEEE Transactions on Circuits and Systems for Video Tech- nology, 2025

work page 2025

[31] [31]

Underwater single image color restoration using haze-lines and a new quantitative dataset,

D. Berman, D. Levy, S. Avidan, and T. Treibitz, “Underwater single image color restoration using haze-lines and a new quantitative dataset,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 8, pp. 2822–2837, 2020

work page 2020

[32] [32]

Zuiderveld,Contrast Limited Adaptive Histogram Equalization

K. Zuiderveld,Contrast Limited Adaptive Histogram Equalization. USA: Academic Press Professional, Inc., 1994, p. 474–485

work page 1994

[33] [33]

Shallow- water image enhancement using relative global histogram stretching based on adaptive parameter acquisition,

D. Huang, Y . Wang, W. Song, J. Sequeira, and S. Mavromatis, “Shallow- water image enhancement using relative global histogram stretching based on adaptive parameter acquisition,” inMultiMedia Modeling: 24th International Conference, MMM 2018, Bangkok, Thailand, February 5- 7, 2018, Proceedings, Part I 24. Springer, 2018, pp. 453–465

work page 2018

[34] [34]

Underwater image enhance- ment via extended multi-scale retinex,

S. Zhang, T. Wang, J. Dong, and H. Yu, “Underwater image enhance- ment via extended multi-scale retinex,”Neurocomputing, vol. 245, pp. 1–9, 2017

work page 2017

[35] [35]

Enhancing underwa- ter images and videos by fusion,

C. Ancuti, C. O. Ancuti, T. Haber, and P. Bekaert, “Enhancing underwa- ter images and videos by fusion,” in2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012, pp. 81–88

work page 2012

[36] [36]

Color balance and fusion for underwater image enhancement,

C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, and P. Bekaert, “Color balance and fusion for underwater image enhancement,”IEEE Transac- tions on Image Processing, vol. 27, no. 1, pp. 379–393, 2017

work page 2017

[37] [37]

Sguie-net: Semantic attention guided underwater image enhancement with multi- scale perception,

Q. Qi, K. Li, H. Zheng, X. Gao, G. Hou, and K. Sun, “Sguie-net: Semantic attention guided underwater image enhancement with multi- scale perception,”IEEE Transactions on Image Processing, vol. 31, pp. 6816–6830, 2022

work page 2022

[38] [38]

Rave: Residual vector embedding for clip-guided backlit image enhancement,

T. Gaintseva, M. Benning, and G. Slabaugh, “Rave: Residual vector embedding for clip-guided backlit image enhancement,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 412–428

work page 2024

[39] [39]

Vqa: Visual question answering,

S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, “Vqa: Visual question answering,” inProceedings of the IEEE international conference on computer vision, 2015, pp. 2425–2433

work page 2015

[40] [40]

Masked au- toencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

work page 2022

[41] [41]

Film: Visual reasoning with a general conditioning layer,

E. Perez, F. Strub, H. De Vries, V . Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

work page 2018

[42] [42]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,”arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[43] [43]

Image enhancement by histogram transformation,

R. Hummel, “Image enhancement by histogram transformation,”Com- puter Graphics Image Processing, vol. 6, no. 2, pp. 184–195, 1977

work page 1977

[44] [44]

Uif: An objective quality assessment for underwater image enhancement,

Y . Zheng, W. Chen, R. Lin, T. Zhao, and P. Le Callet, “Uif: An objective quality assessment for underwater image enhancement,”IEEE Transactions on Image Processing, vol. 31, pp. 5456–5468, 2022

work page 2022

[45] [45]

An underwater color image quality evaluation metric,

M. Yang and A. Sowmya, “An underwater color image quality evaluation metric,”IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 6062–6071, 2015

work page 2015

[46] [46]

Human-visual-system-inspired underwater image quality measures,

K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,”IEEE Journal of Oceanic Engi- neering, vol. 41, no. 3, pp. 541–551, 2015. Shixuan Xureceived the bachelor’s degree in En- gineering from Lanzhou University of Finance and Economics (LZUFE), Lanzhou, Gansu, China, in

work page 2015

[47] [47]

His research interests include computer vision, deep learning and image enhancement

He is currently pursuing the master’s degree in Artificial Intelligence at Ocean University of China. His research interests include computer vision, deep learning and image enhancement. 14 Yabo Liureceived the Ph.D. degree in computer technology from Harbin Institute of Technology, Shenzhen, China, in 2025. From 2021 to 2025, he was a jointly supervised ...

work page 2025