arxiv: 2604.16266 · v1 · submitted 2026-04-17 · 💻 cs.CV

Recognition: unknown

Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement

Shivarth Rai, Tejeswar Pokuri

Pith reviewed 2026-05-10 08:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords underwater image enhancementMambadual-domain learningSS2D blockscolor restorationFFT spectral domainimage restorationstate space models

0 comments

The pith

Hero-Mamba processes RGB images and FFT components in parallel with Mamba blocks to decouple color distortions from texture loss in underwater photos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a Mamba-based architecture can overcome the range limits of CNNs and the quadratic cost of Transformers when restoring underwater images degraded by absorption and scattering. It feeds the network both the spatial RGB view and the spectral FFT view at the same time so that color and brightness factors separate from texture and noise factors. A ColorFusion block then uses a background-light prior to put accurate color back in. Readers would care because this setup delivers linear-complexity global modeling that works on high-resolution inputs and yields measurable gains on standard benchmarks.

Core claim

Hero-Mamba is a Mamba-based network for underwater image enhancement that processes spatial-domain RGB images and spectral-domain FFT components in parallel through SS2D blocks to capture long-range dependencies with linear complexity, then applies a ColorFusion block guided by a background light prior to restore color, producing higher PSNR and SSIM than prior methods on the LSUI and UIEB datasets.

What carries the argument

Mamba SS2D blocks running in parallel on RGB spatial inputs and FFT spectral inputs to model global dependencies linearly while separating color/brightness from texture/noise degradation.

If this is right

The model achieves a PSNR of 25.802 and SSIM of 0.913 on the LSUI benchmark, exceeding state-of-the-art methods.
Linear complexity allows the approach to scale to high-resolution images without the cost of quadratic attention.
The ColorFusion block restores color information with high fidelity using the background light prior.
The dual-domain design improves generalization across varied underwater scenes on both LSUI and UIEB.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same parallel spatial-spectral pattern could be tested on other non-uniform degradations such as haze or low-light scenes.
Replacing attention layers with these Mamba blocks in other vision restoration tasks may cut compute while keeping global context.
Extending the architecture to video sequences would allow frame-to-frame consistency checks that single-image training cannot provide.

Load-bearing premise

That running Mamba blocks on both the RGB image and its FFT version at the same time will reliably separate color and brightness information from texture and noise across many different underwater conditions.

What would settle it

An ablation test on the LSUI dataset in which removing the parallel FFT branch produces no drop in PSNR or SSIM, or a new test set of underwater images on which Hero-Mamba falls below the best published CNN or Transformer scores.

Figures

Figures reproduced from arXiv: 2604.16266 by Shivarth Rai, Tejeswar Pokuri.

**Figure 2.** Figure 2: Architectural design of Hero-Mamba, utilizing spatial and spectral domains for accurate feature reconstruction, and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the Encoder Block represented as: ci = ColorF usion(fi) (3) Then, feature f4 passes through the decoder network, consisting of four decoder blocks. As in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the MS-fusion block. Parallel [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Visual comparison of enhancement results by various models on LSUI dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Visual comparison of enhancement results by various methods on UIEB dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Underwater images often suffer from severe degradation, such as color distortion, low contrast, and blurred details, due to light absorption and scattering in water. While learning-based methods like CNNs and Transformers have shown promise, they face critical limitations: CNNs struggle to model the long-range dependencies needed for non-uniform degradation, and Transformers incur quadratic computational complexity, making them inefficient for high-resolution images. To address these challenges, we propose Hero-Mamba, a novel Mamba-based network that achieves efficient dual-domain learning for underwater image enhancement. Our approach uniquely processes information from both the spatial domain (RGB image) and the spectral domain (FFT components) in parallel. This dual-domain input allows the network to decouple degradation factors, separating color/brightness information from texture/noise. The core of our network utilizes Mamba-based SS2D blocks to capture global receptive fields and long-range dependencies with linear complexity, overcoming the limitations of both CNNs and Transformers. Furthermore, we introduce a ColorFusion block, guided by a background light prior, to restore color information with high fidelity. Extensive experiments on the LSUI and UIEB benchmark datasets demonstrate that Hero-Mamba outperforms state-of-the-art methods. Notably, our model achieves a PSNR of 25.802 and an SSIM of 0.913 on LSUI, validating its superior performance and generalization capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Hero-Mamba runs parallel Mamba blocks on spatial RGB and FFT inputs plus a background-light-guided fusion step for underwater enhancement, with reported benchmark gains but almost no supporting experiments visible.

read the letter

The paper's main move is to feed both the raw RGB image and its FFT components into separate Mamba SS2D streams, then combine them through a ColorFusion block that takes a background light prior as guidance. This is pitched as a way to separate color and brightness shifts from texture and noise without paying Transformer-level compute on high-resolution frames. The reported numbers on LSUI (PSNR 25.8, SSIM 0.91) and UIEB look better than the cited baselines, which is the concrete claim a reader would check first. Mamba's linear complexity is a sensible choice here, and splitting the domains is a direct response to the non-uniform degradations that underwater light scattering produces. Those two design choices are the parts that feel fresh rather than incremental. The rest of the abstract stays at the level of standard motivation and benchmark wins. The soft spot is the complete absence of training protocol, ablation tables, or error breakdowns. Without those, it is impossible to know whether the dual-domain split or the ColorFusion block is actually responsible for the lift, or whether the gains trace to dataset-specific tuning. The central assumption that parallel spatial-spectral processing will cleanly decouple the degradation factors also stays untested in the summary we have. This paper is aimed at people who already work on underwater or low-light restoration and want an efficient Mamba variant to try. A reader who needs a new baseline for marine vision pipelines could extract the architecture description and the numbers, but anyone looking for reproducible insights or general lessons would come away empty. I would send it to referees so the full experiments and code can be examined; the idea is narrow but the efficiency angle is worth checking rather than desk-rejecting outright.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Hero-Mamba, a Mamba-based network for underwater image enhancement that processes RGB spatial and FFT spectral domains in parallel via SS2D blocks to capture long-range dependencies at linear complexity. It introduces a ColorFusion block guided by a background light prior for color restoration and reports outperforming prior methods on the LSUI and UIEB benchmarks, with specific metrics of PSNR 25.802 and SSIM 0.913 on LSUI.

Significance. If the reported gains are attributable to the dual-domain Mamba design rather than tuning, the approach offers a computationally efficient alternative to Transformers for non-uniform underwater degradations. The parallel spatial-spectral processing and background-light-guided fusion represent a concrete attempt to separate degradation factors, which could benefit high-resolution marine vision tasks if properly validated.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): The manuscript reports benchmark superiority (PSNR 25.802 / SSIM 0.913 on LSUI) but supplies no training protocol, optimizer settings, data augmentation details, or ablation studies isolating the dual-domain input, SS2D blocks, or ColorFusion component. Without these, the central claim that the architecture outperforms SOTA cannot be evaluated against the possibility of hyperparameter-driven gains.
[§3.2] §3.2 (Dual-domain design): The assertion that parallel RGB and FFT processing 'decouples color/brightness information from texture/noise' is presented without supporting analysis, feature visualizations, or quantitative metrics showing separation of degradation factors. This assumption is load-bearing for the network motivation yet remains untested in the provided experiments.

minor comments (2)

[Figure 1] Figure 1 (architecture diagram): The flow from dual-domain inputs through SS2D blocks to ColorFusion could be annotated with tensor dimensions and skip connections to improve reproducibility.
[§2] §2 (Related work): The discussion of Mamba variants in vision could include a brief complexity comparison table (e.g., vs. Swin Transformer) to contextualize the linear-complexity advantage.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard deep-learning assumptions about network expressivity and benchmark validity plus the ad-hoc design choice that dual-domain input decouples degradation factors. No formal axioms or proofs are invoked.

free parameters (2)

Mamba block hyperparameters (state dimension, expansion factor)
Chosen during architecture design and training; affect receptive field and capacity.
Background light prior estimation parameters
Used to guide ColorFusion; fitted or tuned on training data.

axioms (2)

domain assumption Mamba SS2D blocks capture long-range dependencies with linear complexity
Invoked to justify superiority over CNNs and Transformers.
ad hoc to paper Dual-domain input separates color/brightness from texture/noise
Core design premise stated in abstract without independent verification.

invented entities (1)

ColorFusion block no independent evidence
purpose: Restore color information using background light prior
New module introduced in the architecture.

pith-pipeline@v0.9.0 · 5544 in / 1397 out tokens · 16357 ms · 2026-05-10T08:32:33.343278+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

[1]

arXiv:1801.04011

Enhancing Un- derwater Imagery using Generative Adversarial Networks. arXiv:1801.04011. Fan, J.; Xu, J.; Zhou, J.; Meng, D.; and Lin, Y . 2024a. See through water: Heuristic modeling towards color correction for underwater image enhancement.IEEE Transactions on Circuits and Systems for Video Technology. Fan, J.; Xu, J.; Zhou, J.; Meng, D.; and Lin, Y . 20...

work page arXiv
[2]

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752. Gu, A.; Goel, K.; and R ´e, C

work page Pith review arXiv
[3]

Efficiently Modeling Long Sequences with Structured State Spaces

Efficiently Mod- eling Long Sequences with Structured State Spaces. arXiv:2111.00396. Guan, M.; Xu, H.; Jiang, G.; Yu, M.; Chen, Y .; Luo, T.; and Song, Y

work page internal anchor Pith review arXiv
[4]

arXiv:2405.08419

WaterMamba: Visual State Space Model for Underwater Image Enhancement. arXiv:2405.08419. Guo, C.; Wu, R.; Jin, X.; Han, L.; Chai, Z.; Zhang, W.; and Li, C

work page arXiv
[5]

arXiv:2208.06857

Underwater Ranker: Learn Which Is Better and How to Be Better. arXiv:2208.06857. Hu, Y .; Wang, B.; and Lin, S

work page arXiv
[6]

FC 4: Fully Convolu- tional Color Constancy with Confidence-Weighted Pooling. 330–339. Huang, S.; Wang, K.; Liu, H.; Chen, J.; and Li, Y . 2023a. Contrastive Semi-supervised Learning for Underwater Im- age Restoration via Reliable Bank. arXiv:2303.09101. Huang, S.; Wang, K.; Liu, H.; Chen, J.; and Li, Y . 2023b. Contrastive Semi-supervised Learning for Un...

work page arXiv
[7]

Islam, M

Underwater Image Enhancement via Adaptive Group Attention-Based Multiscale Cascade Transformer.IEEE Transactions on In- strumentation and Measurement, 71: 1–18. Islam, M. J.; Xia, Y .; and Sattar, J. 2020a. Fast Under- water Image Enhancement for Improved Visual Perception. arXiv:1903.09766. Islam, M. J.; Xia, Y .; and Sattar, J. 2020b. Fast Underwater Im...

work page arXiv 1903
[8]

Decoupled Weight Decay Regularization

Decoupled Weight De- cay Regularization. arXiv:1711.05101. McGlamery, B

work page internal anchor Pith review Pith/arXiv arXiv
[9]

arXiv:2406.01294

CE-V AE: Capsule En- hanced Variational AutoEncoder for Underwater Image En- hancement. arXiv:2406.01294. Ren, T.; Xu, H.; Jiang, G.; Yu, M.; Zhang, X.; Wang, B.; and Luo, T

work page arXiv
[10]

Efros, Eli Shechtman, and Oliver Wang

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. arXiv:1801.03924. Zhang, S.; Duan, Y .; Li, D.; and Zhao, R

work page arXiv
[11]

arXiv:2407.19248

Mamba- UIE: Enhancing Underwater Images with Physical Model Constraint. arXiv:2407.19248. Zhao, C.; Cai, W.; Dong, C.; and Hu, C

work page arXiv
[12]

arXiv:2311.16845

Wavelet- based Fourier Information Interaction with Frequency Dif- fusion Adjustment for Underwater Image Restoration. arXiv:2311.16845. Zhou, J.; Liu, D.; Zhang, D.; and Zhang, W

work page arXiv