pith. machine review for the scientific record. sign in

arxiv: 2604.27653 · v1 · submitted 2026-04-30 · 💻 cs.CV

Recognition: unknown

FUN: A Focal U-Net Combining Reconstruction and Object Detection for Snapshot Spectral Imaging

Authors on Pith no claims yet

Pith reviewed 2026-05-07 07:14 UTC · model grok-4.3

classification 💻 cs.CV
keywords hyperspectral imagingsnapshot spectral imagingobject detectionimage reconstructionmulti-task learningU-Netfocal modulationneural network architecture
0
0 comments X

The pith

A single network jointly reconstructs hyperspectral images and detects objects with reduced resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents the Focal U-shaped Network (FUN) as an end-to-end solution for snapshot spectral imaging that performs both hyperspectral image reconstruction and object detection simultaneously. The goal is to overcome the speed limitations of traditional push-broom systems and the reconstruction delays in snapshot methods by using multi-task learning on a shared backbone. Reconstruction supplies detailed spectral information while detection provides semantic priors, and focal modulation handles feature modulation efficiently without the high cost of self-attention. The approach is validated on a new dataset, showing improved performance on both tasks alongside significant reductions in model size and computation, pointing toward practical real-time applications.

Core claim

The central discovery is that a U-shaped network incorporating focal modulation can serve as a shared backbone for simultaneous HSI reconstruction and object detection, where the two tasks mutually enhance each other through joint training, resulting in state-of-the-art accuracy with substantially lower parameter counts and computational demands compared to prior methods.

What carries the argument

Focal modulation applied in a shared U-Net backbone for multi-task learning between spectral reconstruction and object detection.

If this is right

  • Real-time object detection becomes feasible directly from snapshot HSIs without a separate reconstruction phase.
  • The model requires 40% fewer parameters and 30% less computation than recent alternatives while maintaining top performance.
  • The new dataset with 363 HSIs and 8712 annotations supports further development of joint methods.
  • Edge deployment for hyperspectral sensing applications is more feasible due to the efficiency gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This joint training approach might extend to other paired tasks in imaging, such as denoising combined with segmentation.
  • If focal modulation proves general, it could replace attention mechanisms in other U-Net variants for better efficiency across vision tasks.
  • The architecture could support portable hyperspectral devices for real-world monitoring once optimized further for constrained hardware.

Load-bearing premise

Multi-task learning on a shared backbone with focal modulation will create positive interactions between the reconstruction and detection tasks.

What would settle it

If a two-stage approach of first reconstructing the HSI and then detecting objects separately achieves higher accuracy or lower total latency than the end-to-end FUN on the new dataset.

Figures

Figures reproduced from arXiv: 2604.27653 by Ang Gao, Anqi Li, Dahua Gao, Danhua Liu, Guangming Shi, Yubo Dong, Zhenyuan Lin.

Figure 1
Figure 1. Figure 1: Comparison of different object detection approaches on the CASSI system. (a) The first approach directly applies object detection to the measurement. (b) The second approach reconstructs the measurement into the HSI, then performs object detection on the reconstructed HSI. (c) Our method employs a U-shaped neural network to extract multi-scale features and leverages them to simul￾taneously perform HSI reco… view at source ↗
Figure 2
Figure 2. Figure 2: Diagram of the CASSI System. A 3-D spectral scene is first modulated by a coded aperture and a disperser (prism), then compressed into a measurement by a 2-D detector. over, TogetherNet [51] integrates image restoration with object detection through dynamic enhancement learning to improve detection performance in adverse weather conditions. DENet [54] combines a detection-driven enhancement network, utiliz… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of Self-Attention and Focal Modulation. Self￾Attention adopts a late aggregation approach, while Focal Modulation employs an early aggregation approach in representation learning. stages 4, 5, and 6 are processed by the lightweight detection head to generate the detection results. The SSMB is the core component of FUN. SSMB consists of a Focal Spatial Modulation (FSM) module, a Low-Rank Spectral… view at source ↗
Figure 6
Figure 6. Figure 6: Diagram of the Low-Rank Spectral Modulation (LRSM). The LRSM projects the latent representation into a subspace to obtain a low-rank spectral vector and aggregates global spectral information from a low-rank memory to obtain an enhanced low-rank spectral vector. 1) Hierarchical Contextualization: First, the feature map X ∈ R H×W×C undergoes a transformation into another feature space by applying a linear l… view at source ↗
Figure 7
Figure 7. Figure 7: Comparisons of reconstructed HSIs across various wavelengths (533 nm, 565 nm, 579 nm, and 609 nm). The bottom-left shows the spectral curves corresponding to the green boxes in the measurement. The right side displays the reconstructed HSIs using different methods, with the enlarged patches corresponding to the green boxes shown in the bottom-right corner of the reconstructed HSIs. Zoom in for a better vie… view at source ↗
Figure 8
Figure 8. Figure 8: Detection results by various methods on three different scenes. The proposed FUN not only detects more objects with higher confidence, but also disentangles and identifies densely packed objects, maintaining high accuracy and minimizing overlap among bounding boxes. 2) HSI Object Detection view at source ↗
read the original abstract

Conventional push-broom hyperspectral imaging suffers from slow acquisition speeds, precluding real-time object detection; in contrast, snapshot spectral imaging enables instantaneous hyperspectral images (HSIs) capture, making real-time object detection feasible, yet its potential is often compromised by time-consuming post-capture reconstruction. To address this issue, we propose the Focal U-shaped Network (FUN), a novel end-to-end framework that jointly performs HSI reconstruction and object detection via multi-task learning. FUN employs a shared U-shaped backbone, where reconstruction provides underlying spectral information while detection guides semantic-aware priors learning, facilitating mutually beneficial task interaction. Crucially, we introduce focal modulation, an efficient alternative to self-attention that modulates spatial and spectral features while reducing quadratic computational complexity, enabling a self-attention-free architecture for joint reconstruction and detection. Furthermore, we contribute a new HSI object detection dataset with 8712 annotated objects across 363 HSIs to facilitate evaluation of the proposed method. Experiments demonstrate that FUN achieves state-of-the-art performance on both tasks, using 40% fewer parameters and 30% less computation than recent alternatives, making it promising for future real-time edge deployment. The code and datasets are available: https://github.com/ShawnDong98/FUN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes FUN, a Focal U-shaped Network for joint hyperspectral image reconstruction and object detection in snapshot spectral imaging. It employs a shared U-Net backbone with focal modulation (replacing self-attention) to enable efficient multi-task learning where reconstruction supplies spectral details and detection supplies semantic priors. The authors also release a new dataset of 363 HSIs containing 8712 annotated objects and report state-of-the-art performance on both tasks together with 40% fewer parameters and 30% less computation than recent alternatives.

Significance. If the multi-task interaction and efficiency claims can be rigorously verified, the work would be significant for real-time edge deployment of snapshot hyperspectral systems, as it directly tackles the reconstruction bottleneck while maintaining detection accuracy. The public release of code and the new dataset is a clear positive that would facilitate follow-on research.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): The central claim that the shared backbone plus focal modulation produces 'mutually beneficial task interaction' and that 'detection guides semantic-aware priors learning' is load-bearing for the SOTA and efficiency assertions, yet no ablations are described that isolate joint training from single-task baselines, vary the loss-balancing coefficients, or test for negative transfer. Without these, the reported gains cannot be confidently attributed to the multi-task design rather than focal modulation alone or the new dataset.
  2. [§4] §4 (Experiments) and Table 1/2: The quantitative SOTA claims lack error bars, standard deviations across multiple runs, or statistical significance tests against baselines. This makes it impossible to determine whether the 40% parameter and 30% compute reductions are robust or sensitive to hyperparameter choices.
minor comments (2)
  1. [§3] §3 (Method): The description of focal modulation would benefit from a short equation or diagram showing how it modulates spatial-spectral features, as readers may not be familiar with the referenced prior work.
  2. [§4] §4: The new dataset is introduced without a table summarizing its statistics (e.g., number of classes, spectral bands, train/val/test splits), which would help readers assess its difficulty relative to existing HSI detection benchmarks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed each major comment and provide point-by-point responses below, outlining the revisions we will implement to strengthen the paper's claims regarding multi-task interaction and the robustness of the reported results.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim that the shared backbone plus focal modulation produces 'mutually beneficial task interaction' and that 'detection guides semantic-aware priors learning' is load-bearing for the SOTA and efficiency assertions, yet no ablations are described that isolate joint training from single-task baselines, vary the loss-balancing coefficients, or test for negative transfer. Without these, the reported gains cannot be confidently attributed to the multi-task design rather than focal modulation alone or the new dataset.

    Authors: We appreciate the referee's emphasis on rigorously isolating the contribution of multi-task learning. While the manuscript presents comparisons to recent single-task and multi-task baselines in Tables 1 and 2, we acknowledge that dedicated ablations isolating joint training effects were not included. In the revised manuscript, we will add a new ablation subsection in §4 that includes: (1) direct comparisons of the shared FUN backbone under joint training versus independently trained single-task reconstruction and detection models, (2) sweeps over loss-balancing coefficients (λ_rec and λ_det) to demonstrate robustness and optimal interaction, and (3) explicit checks confirming the absence of negative transfer. These additions will allow readers to attribute performance gains more confidently to the proposed multi-task design with focal modulation. revision: yes

  2. Referee: [§4] §4 (Experiments) and Table 1/2: The quantitative SOTA claims lack error bars, standard deviations across multiple runs, or statistical significance tests against baselines. This makes it impossible to determine whether the 40% parameter and 30% compute reductions are robust or sensitive to hyperparameter choices.

    Authors: We thank the referee for this important point on statistical rigor. The reported 40% parameter reduction and 30% lower computation are deterministic architectural metrics (derived from model parameter counts and FLOPs) and are therefore insensitive to random seeds or hyperparameter variation. For the task-specific metrics (PSNR, SSIM, mAP, etc.), we will rerun all experiments using at least three different random seeds and report mean ± standard deviation in the updated Tables 1 and 2. We will also add pairwise statistical significance tests (e.g., paired t-tests with p-values) against the primary baselines. These updates will be incorporated into the revised §4 and tables. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical claims rest on architecture proposal and external benchmarks

full rationale

The paper introduces a novel multi-task U-Net architecture with focal modulation for joint HSI reconstruction and object detection, contributes a new dataset, and reports empirical SOTA results with efficiency gains. No derivation chain exists that reduces predictions or uniqueness claims to self-defined quantities, fitted parameters renamed as outputs, or load-bearing self-citations. Performance assertions are validated via comparisons to prior external methods rather than internal equations or ansatzes that presuppose the result. The multi-task interaction is presented as a design choice evaluated experimentally, not derived by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on empirical training of a deep neural network whose weights are fitted to data, plus domain assumptions about multi-task benefits and the effectiveness of focal modulation; no new physical entities are postulated.

free parameters (2)
  • Network weights and hyperparameters
    The model parameters are optimized via gradient descent on the combined reconstruction and detection losses using the provided dataset.
  • Task loss balancing coefficients
    Weights used to combine the reconstruction loss and detection loss during multi-task training.
axioms (2)
  • domain assumption Focal modulation effectively captures spatial and spectral dependencies as an alternative to self-attention
    Invoked to justify replacing attention mechanisms while maintaining performance.
  • domain assumption Joint multi-task optimization yields better results on both tasks than separate training
    Core premise of the shared backbone and mutual guidance between reconstruction and detection.

pith-pipeline@v0.9.0 · 5540 in / 1553 out tokens · 113892 ms · 2026-05-07T07:14:55.830337+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 2 canonical work pages · 2 internal anchors

  1. [1]

    Snapshot multispectral endomicroscopy,

    Z. Meng, M. Qiao, J. Ma, Z. Yu, K. Xu, and X. Yuan, “Snapshot multispectral endomicroscopy,”Optics Letters, vol. 45, no. 14, pp. 3897– 3900, 2020

  2. [2]

    Hyper-skin: a hyperspectral dataset for reconstructing facial skin-spectra from rgb images,

    P. C. Ng, Z. Chi, Y . Verdie, J. Lu, and K. N. Plataniotis, “Hyper-skin: a hyperspectral dataset for reconstructing facial skin-spectra from rgb images,”Advances in Neural Information Processing Systems, vol. 36, 2024

  3. [3]

    A novel spectral-spatial multi- scale network for hyperspectral image classification with the res2net block,

    Z. Zhang, D. Liu, D. Gao, and G. Shi, “A novel spectral-spatial multi- scale network for hyperspectral image classification with the res2net block,”International Journal of Remote Sensing, vol. 43, no. 3, pp. 751–777, 2022

  4. [4]

    No-reference hyperspectral image quality assessment via ranking feature learning,

    Y . Li, Y . Dong, H. Li, D. Liu, F. Xue, and D. Gao, “No-reference hyperspectral image quality assessment via ranking feature learning,” Remote Sensing, vol. 16, no. 10, p. 1657, 2024

  5. [5]

    3d imaging spectroscopy for measuring hyperspectral patterns on solid objects,

    M. H. Kim, T. A. Harvey, D. S. Kittle, H. Rushmeier, J. Dorsey, R. O. Prum, and D. J. Brady, “3d imaging spectroscopy for measuring hyperspectral patterns on solid objects,”ACM Transactions on Graphics (TOG), vol. 31, no. 4, pp. 1–11, 2012

  6. [6]

    Compressive coded aperture spectral imaging: An introduction,

    G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging: An introduction,”IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 105–115, 2013

  7. [7]

    High- speed hyperspectral video acquisition with a dual-camera architecture,

    L. Wang, Z. Xiong, D. Gao, G. Shi, W. Zeng, and F. Wu, “High- speed hyperspectral video acquisition with a dual-camera architecture,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4942–4950

  8. [8]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international con- ference, Munich, Germany, October 5-9, 2015, proceedings, part III 18. Springer, 2015, pp. 234–241

  9. [9]

    l-net: Reconstruct hy- perspectral images from a snapshot measurement,

    X. Miao, X. Yuan, Y . Pu, and V . Athitsos, “l-net: Reconstruct hy- perspectral images from a snapshot measurement,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 4059–4069

  10. [10]

    End-to-end low cost compressive spectral imaging with spatial-spectral self-attention,

    Z. Meng, J. Ma, and X. Yuan, “End-to-end low cost compressive spectral imaging with spatial-spectral self-attention,” inEuropean conference on computer vision. Springer, 2020, pp. 187–204

  11. [11]

    Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction,

    Y . Cai, J. Lin, X. Hu, H. Wang, X. Yuan, Y . Zhang, R. Timofte, and L. Van Gool, “Mask-guided spectral-wise transformer for efficient hyperspectral image reconstruction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 502–17 511

  12. [12]

    Coarse-to-fine sparse transformer for hyperspectral image recon- struction,

    ——, “Coarse-to-fine sparse transformer for hyperspectral image recon- struction,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 686–704. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 10

  13. [13]

    Feature pyramid networks for object detection,

    T.-Y . Lin, P. Doll´ar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125

  14. [14]

    Focal modulation networks,

    J. Yang, C. Li, X. Dai, and J. Gao, “Focal modulation networks,” Advances in Neural Information Processing Systems, vol. 35, pp. 4203– 4217, 2022

  15. [15]

    Degradation estimation recurrent neural network with local and non-local priors for compres- sive spectral imaging,

    Y . Dong, D. Gao, Y . Li, G. Shi, and D. Liu, “Degradation estimation recurrent neural network with local and non-local priors for compres- sive spectral imaging,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  16. [16]

    Degradation-aware unfolding half-shuffle transformer for spectral compressive imaging,

    Y . Cai, J. Lin, H. Wang, X. Yuan, H. Ding, Y . Zhang, R. Timofte, and L. V . Gool, “Degradation-aware unfolding half-shuffle transformer for spectral compressive imaging,”Advances in Neural Information Processing Systems, vol. 35, pp. 37 749–37 761, 2022

  17. [17]

    Computational hy- perspectral imaging based on dimension-discriminative low-rank tensor recovery,

    S. Zhang, L. Wang, Y . Fu, X. Zhong, and H. Huang, “Computational hy- perspectral imaging based on dimension-discriminative low-rank tensor recovery,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 10 183–10 192

  18. [18]

    Hyperspectral compressive snapshot reconstruction via coupled low-rank subspace representation and self-supervised deep network,

    Y . Chen, W. Lai, W. He, X.-L. Zhao, and J. Zeng, “Hyperspectral compressive snapshot reconstruction via coupled low-rank subspace representation and self-supervised deep network,”IEEE Transactions on Image Processing, 2024

  19. [19]

    Spectral enhanced rectangle transformer for hyperspectral image denoising,

    M. Li, J. Liu, Y . Fu, Y . Zhang, and D. Dou, “Spectral enhanced rectangle transformer for hyperspectral image denoising,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5805–5814

  20. [20]

    Faster nonconvex low-rank matrix learning for image low-level and high-level vision: A unified framework,

    H. Zhang, J. Yang, J. Qian, C. Gong, X. Ning, Z. Zha, and B. Wen, “Faster nonconvex low-rank matrix learning for image low-level and high-level vision: A unified framework,”Information Fusion, vol. 108, p. 102347, 2024

  21. [21]

    Ef- ficient image classification via structured low-rank matrix factorization regression,

    H. Zhang, J. Yang, J. Qian, G. Gao, X. Lan, Z. Zha, and B. Wen, “Ef- ficient image classification via structured low-rank matrix factorization regression,”IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1496–1509, 2023

  22. [22]

    Accelerated palm for nonconvex low-rank matrix recovery with theo- retical analysis,

    H. Zhang, B. Wen, Z. Zha, B. Zhang, Y . Tang, G. Yu, and W. Du, “Accelerated palm for nonconvex low-rank matrix recovery with theo- retical analysis,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 2304–2317, 2023

  23. [23]

    Efficient and effective nonconvex low-rank subspace clustering via svt- free operators,

    H. Zhang, S. Li, J. Qiu, Y . Tang, J. Wen, Z. Zha, and B. Wen, “Efficient and effective nonconvex low-rank subspace clustering via svt- free operators,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 12, pp. 7515–7529, 2023

  24. [24]

    Enhanced acceleration for generalized nonconvex low-rank matrix learning,

    H. Zhang, J. Yang, W. Du, B. Zhang, Z. Zha, and B. Wen, “Enhanced acceleration for generalized nonconvex low-rank matrix learning,”Chi- nese Journal of Electronics, vol. 34, no. 1, pp. 98–113, 2025

  25. [25]

    Low-rank tensor meets deep prior: Coupling model-driven and data-driven methods for hyperspectral image reconstruction,

    Y . Chen, F. Yuan, W. Lai, J. Zeng, W. He, and Q. Huang, “Low-rank tensor meets deep prior: Coupling model-driven and data-driven methods for hyperspectral image reconstruction,”IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2025

  26. [26]

    Dual-camera design for coded aperture snapshot spectral imaging,

    L. Wang, Z. Xiong, D. Gao, G. Shi, and F. Wu, “Dual-camera design for coded aperture snapshot spectral imaging,”Applied optics, vol. 54, no. 4, pp. 848–858, 2015

  27. [27]

    A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration,

    J. M. Bioucas-Dias and M. A. Figueiredo, “A new twist: Two-step iterative shrinkage/thresholding algorithms for image restoration,”IEEE Transactions on Image processing, vol. 16, no. 12, pp. 2992–3004, 2007

  28. [28]

    Generalized alternating projection based total variation mini- mization for compressive sensing,

    X. Yuan, “Generalized alternating projection based total variation mini- mization for compressive sensing,” in2016 IEEE International confer- ence on image processing (ICIP). IEEE, 2016, pp. 2539–2543

  29. [29]

    Rank minimization for snapshot compressive imaging,

    Y . Liu, X. Yuan, J. Suo, D. J. Brady, and Q. Dai, “Rank minimization for snapshot compressive imaging,”IEEE transactions on pattern analysis and machine intelligence, vol. 41, no. 12, pp. 2990–3006, 2018

  30. [30]

    Combining low-rank and deep plug-and-play priors for snapshot compressive imaging,

    Y . Chen, X. Gui, J. Zeng, X.-L. Zhao, and W. He, “Combining low-rank and deep plug-and-play priors for snapshot compressive imaging,”IEEE Transactions on Neural Networks and Learning Systems, 2023

  31. [31]

    Plug-and-play algorithms for large- scale snapshot compressive imaging,

    X. Yuan, Y . Liu, J. Suo, and Q. Dai, “Plug-and-play algorithms for large- scale snapshot compressive imaging,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 1447–1457

  32. [32]

    Prior images guided generative au- toencoder model for dual-camera compressive spectral imaging,

    Y . Chen, Y . Wang, and H. Zhang, “Prior images guided generative au- toencoder model for dual-camera compressive spectral imaging,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 9, pp. 8629–8643, 2024

  33. [33]

    Deep gaussian scale mixture prior for spectral compressive imaging,

    T. Huang, W. Dong, X. Yuan, J. Wu, and G. Shi, “Deep gaussian scale mixture prior for spectral compressive imaging,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 216–16 225

  34. [34]

    Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging,

    Y . Dong, D. Gao, T. Qiu, Y . Li, M. Yang, and G. Shi, “Residual degradation learning unfolding framework with mixing priors across spectral and spatial for compressive spectral imaging,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 262–22 271

  35. [35]

    Alternating direction unfolding with a cross spectral attention prior for dual-camera compres- sive hyperspectral imaging,

    Y . Dong, D. Gao, D. Liu, Y . Liu, and G. Shi, “Alternating direction unfolding with a cross spectral attention prior for dual-camera compres- sive hyperspectral imaging,”IEEE Transactions on Image Processing, vol. 34, pp. 5325–5340, 2025

  36. [36]

    Progressive content-aware coded hyperspectral snapshot compressive imaging,

    X. Zhang, B. Chen, W. Zou, S. Liu, Y . Zhang, R. Xiong, and J. Zhang, “Progressive content-aware coded hyperspectral snapshot compressive imaging,”IEEE Transactions on Circuits and Systems for Video Tech- nology, vol. 34, no. 11, pp. 10 817–10 830, 2024

  37. [37]

    Dual-domain feature fusion and multi-level memory-enhanced network for spectral compres- sive imaging,

    Y . Ying, J. Wang, Y . Shi, N. Ling, and B. Yin, “Dual-domain feature fusion and multi-level memory-enhanced network for spectral compres- sive imaging,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 10, pp. 9562–9577, 2024

  38. [38]

    Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imag- ing,

    L. Wang, Z. Xiong, G. Shi, F. Wu, and W. Zeng, “Adaptive nonlocal sparse representation for dual-camera compressive hyperspectral imag- ing,”IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 10, pp. 2104–2111, 2016

  39. [39]

    Exploring nonlocal group sparsity under transform learning for hy- perspectral image denoising,

    Y . Chen, W. He, X.-L. Zhao, T.-Z. Huang, J. Zeng, and H. Lin, “Exploring nonlocal group sparsity under transform learning for hy- perspectral image denoising,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–18, 2022

  40. [40]

    Fast large-scale hyperspectral image denoising via non-iterative low- rank subspace representation,

    Y . Chen, J. Zeng, W. He, X.-L. Zhao, T.-X. Jiang, and Q. Huang, “Fast large-scale hyperspectral image denoising via non-iterative low- rank subspace representation,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  41. [41]

    Non-local means denoising,

    A. Buades, B. Coll, and J.-M. Morel, “Non-local means denoising,” Image Processing On Line, vol. 1, pp. 208–212, 2011

  42. [42]

    Thick cloud removal in multitemporal remote sensing images via low-rank regularized self-supervised network,

    Y . Chen, M. Chen, W. He, J. Zeng, M. Huang, and Y .-B. Zheng, “Thick cloud removal in multitemporal remote sensing images via low-rank regularized self-supervised network,”IEEE Transactions on Geoscience and Remote Sensing, 2024

  43. [43]

    You only look once: Unified, real-time object detection,

    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779– 788

  44. [44]

    Focal loss for dense object detection,

    T.-Y . Ross and G. Doll ´ar, “Focal loss for dense object detection,” in proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2980–2988

  45. [45]

    Faster r-cnn: Towards real-time object detection with region proposal networks,

    S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137–1149, 2016

  46. [46]

    Fcos: A simple and strong anchor-free object detector,

    Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: A simple and strong anchor-free object detector,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 4, pp. 1922–1933, 2020

  47. [47]

    YOLOX: Exceeding YOLO Series in 2021

    Z. Ge, “Yolox: Exceeding yolo series in 2021,”arXiv preprint arXiv:2107.08430, 2021

  48. [48]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213– 229

  49. [49]

    Deformable DETR: Deformable Transformers for End-to-End Object Detection

    X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv:2010.04159, 2020

  50. [50]

    Image-adaptive yolo for object detection in adverse weather conditions,

    W. Liu, G. Ren, R. Yu, S. Guo, J. Zhu, and L. Zhang, “Image-adaptive yolo for object detection in adverse weather conditions,” inProceedings of the AAAI conference on artificial intelligence, vol. 36, no. 2, 2022, pp. 1792–1800

  51. [51]

    Togethernet: Bridging image restoration and object detection together via dynamic enhancement learning,

    Y . Wang, X. Yan, K. Zhang, L. Gong, H. Xie, F. L. Wang, and M. Wei, “Togethernet: Bridging image restoration and object detection together via dynamic enhancement learning,” inComputer Graphics Forum, vol. 41, no. 7. Wiley Online Library, 2022, pp. 465–476

  52. [52]

    Image enhancement guided object detection in visually degraded scenes,

    H. Liu, F. Jin, H. Zeng, H. Pu, and B. Fan, “Image enhancement guided object detection in visually degraded scenes,”IEEE transactions on neural networks and learning systems, 2023

  53. [53]

    Dsnet: Joint semantic learning for object detection in inclement weather conditions,

    S.-C. Huang, T.-H. Le, and D.-W. Jaw, “Dsnet: Joint semantic learning for object detection in inclement weather conditions,”IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 8, pp. 2623– 2633, 2020

  54. [54]

    Denet: detection-driven enhancement network for object detection under adverse weather con- ditions,

    Q. Qin, K. Chang, M. Huang, and G. Li, “Denet: detection-driven enhancement network for object detection under adverse weather con- ditions,” inProceedings of the Asian Conference on Computer Vision, 2022, pp. 2813–2829. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11

  55. [55]

    Fa-yolo: An improved yolo model for infrared occlusion object detection under confusing background,

    S. Du, B. Zhang, P. Zhang, P. Xiang, and H. Xue, “Fa-yolo: An improved yolo model for infrared occlusion object detection under confusing background,”Wireless Communications and Mobile Computing, vol. 2021, no. 1, p. 1896029, 2021

  56. [56]

    Joint-sparse- blocks and low-rank representation for hyperspectral unmixing,

    J. Huang, T.-Z. Huang, L.-J. Deng, and X.-L. Zhao, “Joint-sparse- blocks and low-rank representation for hyperspectral unmixing,”IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 4, pp. 2419–2438, 2018