Real-Time Underwater Image Enhancement via Frequency-Guided Dual-Path Attention
Pith reviewed 2026-06-30 06:46 UTC · model grok-4.3
The pith
A reparameterizable network that injects fixed DCT frequency priors and dual-path spatial-spectral attention achieves state-of-the-art underwater image enhancement at 4.23K parameters and over 600 FPS.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that combining Multi-Branch Reparameterizable Convolution with Fixed DCT Priors and Frequency-Guided Dual-Path Attention inside a reparameterizable backbone produces a model of only 4.23K parameters that runs above 600 FPS while exceeding the quantitative scores and visual quality of much larger existing underwater enhancement networks.
What carries the argument
Multi-Branch Reparameterizable Convolution with Fixed DCT Priors (MBRConv-DCT) that injects structured directional frequency priors at training time with zero added inference cost, together with Frequency-Guided Dual-Path Attention (FGDPA) that fuses spatial and spectral information for adaptive modulation.
If this is right
- The full model reports higher PSNR, SSIM, and better visual results than larger competing methods on standard underwater benchmarks.
- Inference cost stays at 4.23K parameters and over 600 FPS because the DCT branches collapse during re-parameterization and the attention overhead remains minimal.
- The same two components can be dropped into other reparameterizable backbones without changing their deployment speed.
- The approach targets exactly the constraints of mobile underwater photography and autonomous robotic systems.
Where Pith is reading between the lines
- Similar frequency-prior injection could be tested on other degradations that have clear spectral signatures, such as haze or low-light noise.
- If the gains survive removal of the specific training protocol, the same fixed DCT pattern might become a reusable building block for any lightweight vision task where frequency content matters.
- A controlled ablation that swaps the fixed DCT for learned frequency bases would clarify whether the fixed choice is necessary or merely convenient.
- The method's compatibility with re-parameterization suggests it could be ported to other edge-vision pipelines that already use structural re-parameterization for speed.
Load-bearing premise
The fixed DCT priors and dual-path frequency attention supply real gains that remain after re-parameterization and are not produced by the training data or evaluation choices alone.
What would settle it
Train an otherwise identical reparameterizable network without the DCT priors or dual-path attention block and compare its PSNR, SSIM, and visual scores on the same underwater test sets; if the gap disappears, the frequency components are not the source of the reported improvement.
Figures
read the original abstract
Real-time underwater image enhancement (UIE) is crucial for mobile underwater photography and autonomous robotic systems, where practical deployment typically requires low latency and compact models under constrained computational resources. Recent ultra-lightweight CNNs based on structural re-parameterization meet these constraints but operate purely in the spatial domain, ignoring the frequency-sensitive nature of underwater degradation. To address this, we propose a lightweight UIE framework that integrates two key components: a Multi-Branch Reparameterizable Convolution with Fixed DCT Priors (MBRConv-DCT) that injects structured directional frequency priors during training, and a Frequency-Guided Dual-Path Attention (FGDPA) module that fuses spatial and spectral cues via a dual-path design for adaptive feature modulation. Both components are fully compatible with structural re-parameterization: the convolution branch introduces zero additional inference cost after re-parameterization, while the attention module incurs only a minimal computational overhead. Experiments show our model achieves state-of-the-art performance with only 4.23K parameters and 600+ FPS, outperforming much larger methods in both quantitative metrics and visual quality. Code is available at https://github.com/LethyZhang/FGDPA.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a lightweight CNN framework for real-time underwater image enhancement that combines a Multi-Branch Reparameterizable Convolution with Fixed DCT Priors (MBRConv-DCT) and a Frequency-Guided Dual-Path Attention (FGDPA) module. Both components are designed to be compatible with structural re-parameterization so that the convolution branch adds zero inference cost; the paper claims this yields state-of-the-art quantitative and visual results at 4.23K parameters and >600 FPS while outperforming larger models.
Significance. If the empirical claims are substantiated, the work would be significant for resource-constrained underwater applications because it shows how fixed frequency priors can be injected during training without compromising real-time inference speed, addressing a gap between purely spatial re-parameterized models and the frequency-sensitive nature of underwater degradation.
major comments (2)
- [Abstract and §4] Abstract and §4 (Experiments): the central claim that the model achieves SOTA performance with 4.23K parameters and 600+ FPS is stated without any quantitative tables, baseline comparisons, ablation studies, or error analysis visible in the provided manuscript text; this prevents verification of the empirical contribution and makes the performance assertion load-bearing but unsupported.
- [§3.2] §3.2 (FGDPA): the description of the dual-path frequency attention does not specify how the spectral cue path is computed or fused with the spatial path after re-parameterization, leaving open whether the claimed minimal overhead is actually achieved or whether the frequency guidance survives the re-param step.
minor comments (1)
- [Abstract] The abstract mentions 'Code is available at https://github.com/LethyZhang/FGDPA' but provides no link to the specific commit or dataset splits used for the reported numbers.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the two major comments point by point below. Where the comments identify gaps in clarity or detail, we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): the central claim that the model achieves SOTA performance with 4.23K parameters and 600+ FPS is stated without any quantitative tables, baseline comparisons, ablation studies, or error analysis visible in the provided manuscript text; this prevents verification of the empirical contribution and makes the performance assertion load-bearing but unsupported.
Authors: We acknowledge that the excerpt provided to the referee may not have rendered the full experimental content. The complete manuscript includes Section 4 with Table 1 (quantitative comparisons vs. baselines), Table 2 (ablations), and supporting error analysis. To prevent any ambiguity, we will revise the abstract and §4 to explicitly embed or reference the key numerical results, baseline tables, and analysis directly in the text. revision: yes
-
Referee: [§3.2] §3.2 (FGDPA): the description of the dual-path frequency attention does not specify how the spectral cue path is computed or fused with the spatial path after re-parameterization, leaving open whether the claimed minimal overhead is actually achieved or whether the frequency guidance survives the re-param step.
Authors: We agree the current description in §3.2 is insufficiently precise. In the revision we will add explicit steps: (1) the spectral cue path is obtained by applying fixed DCT to feature maps and extracting low-frequency coefficients; (2) fusion occurs via element-wise modulation before the re-parameterization step; (3) re-parameterization is restricted to the convolutional branches of MBRConv-DCT, leaving the attention module unchanged at inference. This preserves the frequency guidance while incurring only the stated minimal overhead. revision: yes
Circularity Check
No significant circularity; purely empirical claims
full rationale
The provided abstract and description contain no equations, derivations, or load-bearing predictions. The central claims are framed as experimental outcomes (SOTA metrics at 4.23K parameters / 600+ FPS) from a proposed architecture (MBRConv-DCT + FGDPA) that is compatible with re-parameterization. No self-definitional reductions, fitted inputs renamed as predictions, or self-citation chains appear. This is the common case of an empirical paper whose performance assertions rest on external benchmarks rather than internal construction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Revisiting lightweight low-light image enhancement: From a yuv color space perspective,
H. Yan, S. Liu, X. Zhang, L. Yao, F. Yang, J. Chen, and B. Li, “Revisiting lightweight low-light image enhancement: From a yuv color space perspective,”arXiv preprint arXiv:2601.17349, 2026
-
[2]
Towards lightest low-light image enhancement architecture for mobile devices,
G. Bai, H. Yan, W. Liu, Y . Deng, and E. Dong, “Towards lightest low-light image enhancement architecture for mobile devices,”Expert Systems with Applications, p. 129125, 2025
2025
-
[3]
Zero-shot image seg- mentation for scene objects based on the l 0 gradient minimization and adaptive superpixel method,
H. Yan, J. Huang, M. Zheng, and Y . Tang, “Zero-shot image seg- mentation for scene objects based on the l 0 gradient minimization and adaptive superpixel method,”Neural Computing and Applications, vol. 37, no. 16, pp. 10 141–10 161, 2025
2025
-
[4]
Plastic bottle localization and ranging using improved yolo-pb and binocular stereo vision,
H. Yan, J. Huang, and Z. Zhou, “Plastic bottle localization and ranging using improved yolo-pb and binocular stereo vision,”Measurement, p. 118469, 2025
2025
-
[5]
Animeagent: Is the multi-agent via image-to-video models a good disney storytelling artist?
H. Yan, S. Liu, T. Wang, X. Zhang, Y . Zhong, J. Chen, L. Zhang, and B. Li, “Animeagent: Is the multi-agent via image-to-video models a good disney storytelling artist?”arXiv preprint arXiv:2602.20664, 2026
-
[6]
Aquadiff: Diffusion-based underwater image enhancement for addressing color distortion,
A. Shaahid and M. Behzad, “Aquadiff: Diffusion-based underwater image enhancement for addressing color distortion,” 2025. [Online]. Available: https://arxiv.org/abs/2512.14760
-
[7]
Ntire 2024 challenge on low light image enhancement: Methods and results,
X. Liu, Z. Wu, A. Li, F.-A. Vasluianu, Y . Zhang, S. Gu, L. Zhang, C. Zhu, R. Timofte, Z. Jinet al., “Ntire 2024 challenge on low light image enhancement: Methods and results,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6571–6594
2024
-
[8]
Ntire 2025 challenge on low light image enhancement: Methods and results,
X. L. et al., “Ntire 2025 challenge on low light image enhancement: Methods and results,” 2025. [Online]. Available: https://arxiv.org/abs/2510.13670
-
[9]
Repvgg: Mak- ing vgg-style convnets great again,
X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Mak- ing vgg-style convnets great again,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 13 733–13 742
2021
-
[10]
Mobileone: An improved one millisecond mobile backbone,
P. K. A. Vasu, J. Gabriel, J. Zhu, O. Tuzel, and A. Ranjan, “Mobileone: An improved one millisecond mobile backbone,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7907–7917
2023
-
[11]
Mobileie: An extremely lightweight and effective convnet for real-time image enhancement on mobile devices,
H. Yan, A. Li, X. Zhang, Z. Liu, Z. Shi, C. Zhu, and L. Zhang, “Mobileie: An extremely lightweight and effective convnet for real-time image enhancement on mobile devices,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), October 2025, pp. 21 949–21 960
2025
-
[12]
A comprehensive survey on underwater image enhancement based on deep learning,
X. Cong, Y . Zhao, J. Gui, J. Hou, and D. Tao, “A comprehensive survey on underwater image enhancement based on deep learning,”arXiv preprint arXiv:2405.19684, 2024
-
[13]
Wavelet-based fourier infor- mation interaction with frequency diffusion adjustment for underwater image restoration,
C. Zhao, W. Cai, C. Dong, and C. Hu, “Wavelet-based fourier infor- mation interaction with frequency diffusion adjustment for underwater image restoration,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2024, pp. 8281– 8291
2024
-
[14]
Igdnet: Zero-shot robust underex- posed image enhancement via illumination-guided and denoising,
H. Yan, J. Huang, and T. Huang, “Igdnet: Zero-shot robust underex- posed image enhancement via illumination-guided and denoising,”IEEE Transactions on Artificial Intelligence, 2025
2025
-
[15]
Filamentary convolution for sli: A brain-inspired approach with high efficiency,
B. Zhang, X. Yang, T. Xie, S. Zhu, and B. Zeng, “Filamentary convolution for sli: A brain-inspired approach with high efficiency,”Sensors, vol. 25, no. 10, 2025. [Online]. Available: https://www.mdpi.com/1424-8220/25/10/3085
2025
-
[16]
Fcanet: Frequency channel attention networks,
Z. Qin, P. Zhang, F. Wu, and X. Li, “Fcanet: Frequency channel attention networks,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 783–792
2021
-
[17]
Squeeze-and-excitation networks,
J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141
2018
-
[18]
Cbam: Convolutional block attention module,
S. Woo, J. Park, J.-Y . Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” inProceedings of the European Conference on Computer Vision (ECCV), September 2018
2018
-
[19]
Toward sufficient spatial- frequency interaction for gradient-aware underwater image enhance- ment,
C. Zhao, W. Cai, C. Dong, and Z. Zeng, “Toward sufficient spatial- frequency interaction for gradient-aware underwater image enhance- ment,” inICASSP 2024-2024 IEEE International Conference on Acous- tics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 3220– 3224
2024
-
[20]
Towards lightest low- light image enhancement architecture for mobile devices,
G. Bai, H. Yan, W. Liu, Y . Deng, and E. Dong, “Towards lightest low- light image enhancement architecture for mobile devices,”Expert Sys- tems with Applications, vol. 296, p. 129125, 2026. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417425027423
2026
-
[21]
Boths: Super lightweight network-enabled underwater image enhancement,
X. Liu, S. Lin, K. Chi, Z. Tao, and Y . Zhao, “Boths: Super lightweight network-enabled underwater image enhancement,”IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023
2023
-
[22]
U-shape transformer for underwater image enhancement,
L. Peng, C. Zhu, and L. Bian, “U-shape transformer for underwater image enhancement,”IEEE Transactions on Image Processing, vol. 32, pp. 3066–3079, 2023
2023
-
[23]
Liteenhancenet: A lightweight network for real-time single underwater image enhancement,
S. Zhang, S. Zhao, D. An, D. Li, and R. Zhao, “Liteenhancenet: A lightweight network for real-time single underwater image enhancement,”Expert Systems with Ap- plications, vol. 240, p. 122546, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0957417423030488
2024
-
[24]
A 7k parameter model for underwater image enhancement based on transmission map prior,
F. Zhou, D. Wei, Y . Fan, Y . Huang, and Y . Zhang, “A 7k parameter model for underwater image enhancement based on transmission map prior,”arXiv preprint arXiv:2405.16197, 2024
-
[25]
An underwater image enhancement benchmark dataset and beyond,
C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,”IEEE Transactions on Image Processing, vol. 29, pp. 4376–4389, 2020
2020
-
[26]
Fast underwater image enhancement for improved visual perception,
M. J. Islam, Y . Xia, and J. Sattar, “Fast underwater image enhancement for improved visual perception,”IEEE robotics and automation letters, vol. 5, no. 2, pp. 3227–3234, 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.