pith. machine review for the scientific record. sign in

arxiv: 2604.16266 · v1 · submitted 2026-04-17 · 💻 cs.CV

Recognition: unknown

Hero-Mamba: Mamba-based Dual Domain Learning for Underwater Image Enhancement

Shivarth Rai, Tejeswar Pokuri

Pith reviewed 2026-05-10 08:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords underwater image enhancementMambadual-domain learningSS2D blockscolor restorationFFT spectral domainimage restorationstate space models
0
0 comments X

The pith

Hero-Mamba processes RGB images and FFT components in parallel with Mamba blocks to decouple color distortions from texture loss in underwater photos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a Mamba-based architecture can overcome the range limits of CNNs and the quadratic cost of Transformers when restoring underwater images degraded by absorption and scattering. It feeds the network both the spatial RGB view and the spectral FFT view at the same time so that color and brightness factors separate from texture and noise factors. A ColorFusion block then uses a background-light prior to put accurate color back in. Readers would care because this setup delivers linear-complexity global modeling that works on high-resolution inputs and yields measurable gains on standard benchmarks.

Core claim

Hero-Mamba is a Mamba-based network for underwater image enhancement that processes spatial-domain RGB images and spectral-domain FFT components in parallel through SS2D blocks to capture long-range dependencies with linear complexity, then applies a ColorFusion block guided by a background light prior to restore color, producing higher PSNR and SSIM than prior methods on the LSUI and UIEB datasets.

What carries the argument

Mamba SS2D blocks running in parallel on RGB spatial inputs and FFT spectral inputs to model global dependencies linearly while separating color/brightness from texture/noise degradation.

If this is right

  • The model achieves a PSNR of 25.802 and SSIM of 0.913 on the LSUI benchmark, exceeding state-of-the-art methods.
  • Linear complexity allows the approach to scale to high-resolution images without the cost of quadratic attention.
  • The ColorFusion block restores color information with high fidelity using the background light prior.
  • The dual-domain design improves generalization across varied underwater scenes on both LSUI and UIEB.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same parallel spatial-spectral pattern could be tested on other non-uniform degradations such as haze or low-light scenes.
  • Replacing attention layers with these Mamba blocks in other vision restoration tasks may cut compute while keeping global context.
  • Extending the architecture to video sequences would allow frame-to-frame consistency checks that single-image training cannot provide.

Load-bearing premise

That running Mamba blocks on both the RGB image and its FFT version at the same time will reliably separate color and brightness information from texture and noise across many different underwater conditions.

What would settle it

An ablation test on the LSUI dataset in which removing the parallel FFT branch produces no drop in PSNR or SSIM, or a new test set of underwater images on which Hero-Mamba falls below the best published CNN or Transformer scores.

Figures

Figures reproduced from arXiv: 2604.16266 by Shivarth Rai, Tejeswar Pokuri.

Figure 1
Figure 1. Figure 1: Visualizing the contribution of the background [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architectural design of Hero-Mamba, utilizing spatial and spectral domains for accurate feature reconstruction, and [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the Encoder Block represented as: ci = ColorF usion(fi) (3) Then, feature f4 passes through the decoder network, con￾sisting of four decoder blocks. As in [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the MS-fusion block. Parallel [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparison of enhancement results by various models on LSUI dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison of enhancement results by various methods on UIEB dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
read the original abstract

Underwater images often suffer from severe degradation, such as color distortion, low contrast, and blurred details, due to light absorption and scattering in water. While learning-based methods like CNNs and Transformers have shown promise, they face critical limitations: CNNs struggle to model the long-range dependencies needed for non-uniform degradation, and Transformers incur quadratic computational complexity, making them inefficient for high-resolution images. To address these challenges, we propose Hero-Mamba, a novel Mamba-based network that achieves efficient dual-domain learning for underwater image enhancement. Our approach uniquely processes information from both the spatial domain (RGB image) and the spectral domain (FFT components) in parallel. This dual-domain input allows the network to decouple degradation factors, separating color/brightness information from texture/noise. The core of our network utilizes Mamba-based SS2D blocks to capture global receptive fields and long-range dependencies with linear complexity, overcoming the limitations of both CNNs and Transformers. Furthermore, we introduce a ColorFusion block, guided by a background light prior, to restore color information with high fidelity. Extensive experiments on the LSUI and UIEB benchmark datasets demonstrate that Hero-Mamba outperforms state-of-the-art methods. Notably, our model achieves a PSNR of 25.802 and an SSIM of 0.913 on LSUI, validating its superior performance and generalization capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Hero-Mamba, a Mamba-based network for underwater image enhancement that processes RGB spatial and FFT spectral domains in parallel via SS2D blocks to capture long-range dependencies at linear complexity. It introduces a ColorFusion block guided by a background light prior for color restoration and reports outperforming prior methods on the LSUI and UIEB benchmarks, with specific metrics of PSNR 25.802 and SSIM 0.913 on LSUI.

Significance. If the reported gains are attributable to the dual-domain Mamba design rather than tuning, the approach offers a computationally efficient alternative to Transformers for non-uniform underwater degradations. The parallel spatial-spectral processing and background-light-guided fusion represent a concrete attempt to separate degradation factors, which could benefit high-resolution marine vision tasks if properly validated.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): The manuscript reports benchmark superiority (PSNR 25.802 / SSIM 0.913 on LSUI) but supplies no training protocol, optimizer settings, data augmentation details, or ablation studies isolating the dual-domain input, SS2D blocks, or ColorFusion component. Without these, the central claim that the architecture outperforms SOTA cannot be evaluated against the possibility of hyperparameter-driven gains.
  2. [§3.2] §3.2 (Dual-domain design): The assertion that parallel RGB and FFT processing 'decouples color/brightness information from texture/noise' is presented without supporting analysis, feature visualizations, or quantitative metrics showing separation of degradation factors. This assumption is load-bearing for the network motivation yet remains untested in the provided experiments.
minor comments (2)
  1. [Figure 1] Figure 1 (architecture diagram): The flow from dual-domain inputs through SS2D blocks to ColorFusion could be annotated with tensor dimensions and skip connections to improve reproducibility.
  2. [§2] §2 (Related work): The discussion of Mamba variants in vision could include a brief complexity comparison table (e.g., vs. Swin Transformer) to contextualize the linear-complexity advantage.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claim rests on standard deep-learning assumptions about network expressivity and benchmark validity plus the ad-hoc design choice that dual-domain input decouples degradation factors. No formal axioms or proofs are invoked.

free parameters (2)
  • Mamba block hyperparameters (state dimension, expansion factor)
    Chosen during architecture design and training; affect receptive field and capacity.
  • Background light prior estimation parameters
    Used to guide ColorFusion; fitted or tuned on training data.
axioms (2)
  • domain assumption Mamba SS2D blocks capture long-range dependencies with linear complexity
    Invoked to justify superiority over CNNs and Transformers.
  • ad hoc to paper Dual-domain input separates color/brightness from texture/noise
    Core design premise stated in abstract without independent verification.
invented entities (1)
  • ColorFusion block no independent evidence
    purpose: Restore color information using background light prior
    New module introduced in the architecture.

pith-pipeline@v0.9.0 · 5544 in / 1397 out tokens · 16357 ms · 2026-05-10T08:32:33.343278+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 2 internal anchors

  1. [1]

    arXiv:1801.04011

    Enhancing Un- derwater Imagery using Generative Adversarial Networks. arXiv:1801.04011. Fan, J.; Xu, J.; Zhou, J.; Meng, D.; and Lin, Y . 2024a. See through water: Heuristic modeling towards color correction for underwater image enhancement.IEEE Transactions on Circuits and Systems for Video Technology. Fan, J.; Xu, J.; Zhou, J.; Meng, D.; and Lin, Y . 20...

  2. [2]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv:2312.00752. Gu, A.; Goel, K.; and R ´e, C

  3. [3]

    Efficiently Modeling Long Sequences with Structured State Spaces

    Efficiently Mod- eling Long Sequences with Structured State Spaces. arXiv:2111.00396. Guan, M.; Xu, H.; Jiang, G.; Yu, M.; Chen, Y .; Luo, T.; and Song, Y

  4. [4]

    arXiv:2405.08419

    WaterMamba: Visual State Space Model for Underwater Image Enhancement. arXiv:2405.08419. Guo, C.; Wu, R.; Jin, X.; Han, L.; Chai, Z.; Zhang, W.; and Li, C

  5. [5]

    arXiv:2208.06857

    Underwater Ranker: Learn Which Is Better and How to Be Better. arXiv:2208.06857. Hu, Y .; Wang, B.; and Lin, S

  6. [6]

    FC 4: Fully Convolu- tional Color Constancy with Confidence-Weighted Pooling. 330–339. Huang, S.; Wang, K.; Liu, H.; Chen, J.; and Li, Y . 2023a. Contrastive Semi-supervised Learning for Underwater Im- age Restoration via Reliable Bank. arXiv:2303.09101. Huang, S.; Wang, K.; Liu, H.; Chen, J.; and Li, Y . 2023b. Contrastive Semi-supervised Learning for Un...

  7. [7]

    Islam, M

    Underwater Image Enhancement via Adaptive Group Attention-Based Multiscale Cascade Transformer.IEEE Transactions on In- strumentation and Measurement, 71: 1–18. Islam, M. J.; Xia, Y .; and Sattar, J. 2020a. Fast Under- water Image Enhancement for Improved Visual Perception. arXiv:1903.09766. Islam, M. J.; Xia, Y .; and Sattar, J. 2020b. Fast Underwater Im...

  8. [8]

    Decoupled Weight Decay Regularization

    Decoupled Weight De- cay Regularization. arXiv:1711.05101. McGlamery, B

  9. [9]

    arXiv:2406.01294

    CE-V AE: Capsule En- hanced Variational AutoEncoder for Underwater Image En- hancement. arXiv:2406.01294. Ren, T.; Xu, H.; Jiang, G.; Yu, M.; Zhang, X.; Wang, B.; and Luo, T

  10. [10]

    Efros, Eli Shechtman, and Oliver Wang

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. arXiv:1801.03924. Zhang, S.; Duan, Y .; Li, D.; and Zhao, R

  11. [11]

    arXiv:2407.19248

    Mamba- UIE: Enhancing Underwater Images with Physical Model Constraint. arXiv:2407.19248. Zhao, C.; Cai, W.; Dong, C.; and Hu, C

  12. [12]

    arXiv:2311.16845

    Wavelet- based Fourier Information Interaction with Frequency Dif- fusion Adjustment for Underwater Image Restoration. arXiv:2311.16845. Zhou, J.; Liu, D.; Zhang, D.; and Zhang, W