pith. machine review for the scientific record. sign in

arxiv: 2605.08073 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.AI

Recognition: no theorem link

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords event-based visionimage reconstructionstate space modelsmotion deblurringderainingHDR enhancementsparse attentionefficient neural networks
0
0 comments X

The pith

A state space model reconstructs images from event streams with linear complexity and better accuracy than CNNs or transformers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EmambaIR to adapt state space models for event-guided image reconstruction. It targets the inability of CNNs to capture global correlations and the quadratic cost of vision transformers by adding two components that keep overall complexity linear. The first performs sparse pixel-level attention to fuse event and image data, while the second adds gating to improve temporal modeling inside standard state space layers. Experiments across motion deblurring, deraining, and HDR tasks on six datasets show gains in quality alongside lower memory and compute use. A sympathetic reader would care because event cameras supply fast, high-dynamic-range data whose reconstruction has previously been too slow or memory-heavy for practical high-resolution use.

Core claim

EmambaIR combines a cross-modal Top-k Sparse Attention Module that performs efficient pixel-level top-k attention to guide event-image fusion with a Gated State-Space Module that adds a nonlinear gated unit to vanilla linear state space models, thereby capturing global contextual dependencies and temporal information from sparse event streams without quadratic cost. This architecture is applied to three reconstruction tasks and yields higher-quality outputs than prior CNN- and ViT-based methods while consuming substantially less memory and computation on six datasets.

What carries the argument

The cross-modal Top-k Sparse Attention Module for sparse cross-modal fusion paired with the Gated State-Space Module that enhances temporal representation inside linear-complexity state space layers.

If this is right

  • It delivers higher reconstruction quality than state-of-the-art methods on motion deblurring, deraining, and HDR enhancement across six datasets.
  • It achieves substantial reductions in memory consumption and computational cost relative to vision transformers.
  • Its linear complexity supports use in high-resolution scenarios where prior transformer approaches become prohibitive.
  • It effectively processes the spatially sparse and temporally continuous nature of event streams for image restoration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sparse-attention-plus-gating pattern could be tested on other event-based tasks such as tracking or segmentation.
  • If the linear scaling holds beyond the reported resolutions, the model might run in real time on edge hardware for autonomous systems.
  • The gated enhancement technique could be applied to other state-space-model variants in computer vision to improve temporal modeling.

Load-bearing premise

That the top-k sparse attention and gated state-space additions together model global dependencies and event-image interactions more effectively than CNNs or vision transformers while preserving linear scaling.

What would settle it

Direct head-to-head runs on the six datasets for motion deblurring, deraining, and HDR enhancement that show EmambaIR using more memory or time than current best methods or producing lower-quality reconstructions would disprove the central efficiency and performance claims.

Figures

Figures reproduced from arXiv: 2605.08073 by Wei Yu, Yunhang Qian.

Figure 1
Figure 1. Figure 1: A performance and efficiency comparison be [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of (a) our EmambaIR for event-guided image reconstruction, which consists of a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Illustration of the proposed nonlinear gated unit. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison results of different image deblurring methods on the GoPro dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison results of different image [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Ablation studies on the impact of varying [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., $O(n^2)$), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams. Our framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM). TSAM efficiently performs pixel-level top-k sparse attention to guide cross-modal interactions, yielding rich yet sparse fusion features. Subsequently, GSSM utilizes a nonlinear gated unit to enhance the temporal representation of vanilla linear-complexity ($O(n)$) SSMs, effectively capturing global contextual dependencies without the typical computational overhead. Extensive experiments on six datasets across three diverse image reconstruction tasks - motion deblurring, deraining, and High Dynamic Range (HDR) enhancement - demonstrate that EmambaIR significantly outperforms state-of-the-art methods while offering substantial reductions in memory consumption and computational cost. The source code and data are publicly available at: https://github.com/YunhangWickert/EmambaIR

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces EmambaIR, an efficient visual state space model for event-guided image reconstruction from sparse event streams. It proposes two modules: the cross-modal Top-k Sparse Attention Module (TSAM) for efficient pixel-level sparse attention to enable cross-modal fusion, and the Gated State-Space Module (GSSM) that augments linear-complexity SSMs with a nonlinear gated unit for improved temporal modeling. The framework is evaluated on motion deblurring, deraining, and HDR enhancement tasks across six datasets, claiming superior performance over CNN- and ViT-based SOTA methods alongside substantial reductions in memory and compute.

Significance. If the empirical results and efficiency claims hold, the work offers a promising direction for scalable event-based vision by leveraging SSMs to achieve global context with linear complexity, potentially enabling high-resolution applications where ViTs are prohibitive. Public release of code and data is a positive factor for reproducibility.

minor comments (3)
  1. [Abstract] Abstract: The claim of significant outperformance and efficiency gains is stated without any numerical metrics, error bars, or dataset identifiers, which reduces immediate assessability even though the full manuscript presumably contains these details in the experiments section.
  2. [§4] §4 (Experiments): Confirm that all reported comparisons include consistent evaluation protocols (e.g., same input resolutions, event representations) across the six datasets to support the cross-task generalization claim.
  3. [Figure 3] Figure 3 or equivalent architecture diagram: Clarify the exact integration point of TSAM outputs into GSSM to avoid ambiguity in how cross-modal features propagate through the state-space layers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of EmambaIR and the recommendation for minor revision. We appreciate the recognition that the approach offers a promising direction for scalable event-based vision by achieving global context with linear complexity.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript introduces EmambaIR as an architectural framework combining TSAM (cross-modal Top-k Sparse Attention) and GSSM (Gated State-Space Module) to process event streams for image reconstruction. Its central claims rest on empirical results across six datasets for deblurring, deraining, and HDR tasks, plus the standard linear-complexity property of SSMs. No derivation chain, equation, or performance prediction reduces by construction to a fitted parameter, self-definition, or self-citation; the modules are defined independently and evaluated externally. This is the expected non-finding for an empirical architecture paper whose results are not forced by internal re-labeling of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no specific free parameters, axioms, or invented entities can be audited. The central claim rests on standard deep-learning assumptions about generalization from the reported experiments and the linear scaling property of SSMs.

pith-pipeline@v0.9.0 · 5541 in / 1167 out tokens · 50502 ms · 2026-05-11T02:00:21.769942+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 2 internal anchors

  1. [1]

    2-d ssm: A general spatial layer for visual transformers

    Ethan Baron, Itamar Zimerman, and Lior Wolf. 2-d ssm: A general spatial layer for visual transformers. arXiv preprint arXiv:2306.06635, 2023

  2. [2]

    Retinexformer: One-stage retinex-based transformer for low-light im- age enhancement

    Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Timofte, and Yulun Zhang. Retinexformer: One-stage retinex-based transformer for low-light im- age enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12504–12513, 2023

  3. [3]

    Hinet: Half instance normalization network for image restoration

    Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Chengpeng Chen. Hinet: Half instance normalization network for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 182–192, 2021

  4. [4]

    Learning a sparse transformer network for effective image deraining

    Xiang Chen, Hao Li, Mingqiang Li, and Jinshan Pan. Learning a sparse transformer network for effective image deraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 5896–5905, 2023

  5. [5]

    Rethinking coarse-to-fine approach in single image deblurring

    Cho, S.J., Ji, S.W., Hong, J.P., Jung, S.W., Ko, and S.J. Rethinking coarse-to-fine approach in single image deblurring. 2021

  6. [6]

    Reciprocal attention mixing transformer for lightweight image restora- tion

    Haram Choi, Cheolwoong Na, Jihyeon Oh, Seung- jae Lee, Jinseop Kim, Subeen Choe, Jeongmin Lee, Taehoon Kim, and Jihoon Yang. Reciprocal attention mixing transformer for lightweight image restora- tion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5992–6002, 2024

  7. [7]

    Nafssr: Stereo image super-resolution using nafnet

    Xiaojie Chu, Liangyu Chen, and Wenqing Yu. Nafssr: Stereo image super-resolution using nafnet. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1239–1248, 2022

  8. [8]

    Dancing in the dark: A benchmark towards general low-light video enhancement

    Huiyuan Fu, Wenkai Zheng, Xicong Wang, Jiaxuan Wang, Heng Zhang, and Huadong Ma. Dancing in the dark: A benchmark towards general low-light video enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12877–12886, 2023

  9. [9]

    Learning enriched features via selective state spaces model for efficient image deblurring

    Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, and Depeng Dang. Learning enriched features via selective state spaces model for efficient image deblurring. InProceedings of the 32nd ACM Inter- national Conference on Multimedia, pages 710–718, 2024

  10. [10]

    Efficient frequency-domain image deraining with contrastive regularization

    Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. Efficient frequency-domain image deraining with contrastive regularization. InEuropean Confer- ence on Computer Vision (ECCV). Springer, 2024

  11. [12]

    Efficiently Modeling Long Sequences with Structured State Spaces

    Albert Gu, Karan Goel, and Christopher Ré. Ef- ficiently modeling long sequences with structured state spaces. page arXiv:2111.00396, 2021

  12. [13]

    Mambair: A simple baseline for image restoration with state-space model

    Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. Mambair: A simple baseline for image restoration with state-space model. InEuropean Conference on Computer Vision, pages 222–241. Springer, 2025

  13. [14]

    Multi-scale representation learning for image restoration with state-space model,

    Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, and Lu Wang. Multi-scale representation learning for image restoration with state-space model.arXiv preprint arXiv:2408.10145, 2024

  14. [15]

    Long movie clip classification with state-space video mod- els

    Md Mohaiminul Islam and Gedas Bertasius. Long movie clip classification with state-space video mod- els. InEuropean Conference on Computer Vision, pages 87–104. Springer, 2022

  15. [16]

    Noise-trained deep neural networks effectively pre- dict human vision and its neural responses to chal- lenging images.PLoS Biology, 19(12):e3001418, 2021

    Hojin Jang, Devin McCormack, and Frank Tong. Noise-trained deep neural networks effectively pre- dict human vision and its neural responses to chal- lenging images.PLoS Biology, 19(12):e3001418, 2021

  16. [17]

    Learning event-based motion deblurring

    Jiang, Z., Zhang, Y ., Zou, D., Ren, J., Lv, J., Liu, and Y . Learning event-based motion deblurring. 2020

  17. [18]

    Frequency-aware event-based video deblurring for real-world motion blur

    Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Frequency-aware event-based video deblurring for real-world motion blur. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 24966–24976, 2024

  18. [19]

    Adam: A method for stochastic optimization.ICLR, 2015

    Kingma, D.P., Ba, and J. Adam: A method for stochastic optimization.ICLR, 2015

  19. [20]

    Blind de- convolution using alternating maximum a posteriori estimation with heavy-tailed priors

    Kotera, J., Sroubek, F., Milanfar, and P. Blind de- convolution using alternating maximum a posteriori estimation with heavy-tailed priors. 2013. 8 APREPRINT- MAY11, 2026

  20. [21]

    Blind deconvo- lution using a normalized sparsity measure

    Krishnan, D., Tay, T., Fergus, and R. Blind deconvo- lution using a normalized sparsity measure. 2011

  21. [22]

    Knn local attention for image restora- tion

    Hunsang Lee, Hyesong Choi, Kwanghoon Sohn, and Dongbo Min. Knn local attention for image restora- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2139–2149, 2022

  22. [23]

    Towards robust event-guided low- light image enhancement: A large-scale real-world event-image dataset and novel approach

    Guoqiang Liang, Kanghao Chen, Hangyu Li, Yunfan Lu, and Lin Wang. Towards robust event-guided low- light image enhancement: A large-scale real-world event-image dataset and novel approach. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23–33, 2024

  23. [24]

    Swinir: Image restoration using swin transformer

    Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

  24. [25]

    Learning event-driven video deblurring and interpolation

    Lin, S., Zhang, J., Pan, J., Jiang, Z., Zou, D., Wang, Y ., Chen, J., Ren, and J. Learning event-driven video deblurring and interpolation. 2020

  25. [26]

    Pay attention to mlps.Advances in neural informa- tion processing systems, 34:9204–9215, 2021

    Hanxiao Liu, Zihang Dai, David So, and Quoc V Le. Pay attention to mlps.Advances in neural informa- tion processing systems, 34:9204–9215, 2021

  26. [27]

    arXiv preprint arXiv:2401.10166 , year=

    Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yun- fan Liu. Vmamba: Visual state space model. page arXiv:2401.10166, 2024

  27. [28]

    Event camera demosaicing via swin trans- former and pixel-focus loss

    Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, and Hui Xiong. Event camera demosaicing via swin trans- former and pixel-focus loss. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 1095–1105, 2024

  28. [29]

    U-mamba: Enhancing long-range dependency for biomedical image segmentation

    Jun Ma, Feifei Li, and Bo Wang. U-mamba: En- hancing long-range dependency for biomedical im- age segmentation.arXiv preprint arXiv:2401.04722, 2024

  29. [30]

    Multi-bracket high dynamic range imaging with event cameras

    Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. Multi-bracket high dynamic range imaging with event cameras. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 547–557, 2022

  30. [31]

    Deep multi- scale convolutional neural network for dynamic scene deblurring.CVPR, 2017

    Nah, S., Hyun Kim, T., Mu Lee, and K. Deep multi- scale convolutional neural network for dynamic scene deblurring.CVPR, 2017

  31. [32]

    On the integration of self-attention and convolution

    Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, and Gao Huang. On the integration of self-attention and convolution. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 815–825, 2022

  32. [33]

    arXiv preprint arXiv:2403.13600 (2024)

    Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, and Jing Liu. Vl-mamba: Exploring state space models for multimodal learning. page arXiv:2403.13600, 2024

  33. [34]

    Esim: an open event camera simulator.CoLR, 2018

    Rebecq, H., Gehrig, D., Scaramuzza, and D. Esim: an open event camera simulator.CoLR, 2018

  34. [35]

    Blurry video frame interpolation

    Wang Shen, Wenbo Bao, Guangtao Zhai, Li Chen, Xiongkuo Min, and Zhiyong Gao. Blurry video frame interpolation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5114–5123, 2020

  35. [36]

    Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024

    Yuan Shi, Bin Xia, Xiaoyu Jin, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, and Wenming Yang. Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024

  36. [37]

    Reducing the sim-to-real gap for event cam- eras.ECCV, 2020

    Stoffregen, T., Scheerlinck, C., Scaramuzza, D., Drummond, T., Barnes, N., Kleeman, L., Mahony, and R. Reducing the sim-to-real gap for event cam- eras.ECCV, 2020

  37. [38]

    Spatially-attentive patch-hierarchical network for adaptive motion deblurring

    Suin, M., Purohit, K., Rajagopalan, and A.N. Spatially-attentive patch-hierarchical network for adaptive motion deblurring. 2020

  38. [39]

    Event-based fusion for motion deblur- ring with cross-modal attention

    Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblur- ring with cross-modal attention. InEuropean confer- ence on computer vision, pages 412–428. Springer, 2022

  39. [40]

    Event-based frame inter- polation with ad-hoc deblurring

    Lei Sun, Christos Sakaridis, Jingyun Liang, Peng Sun, Jiezhang Cao, Kai Zhang, Qi Jiang, Kaiwei Wang, and Luc Van Gool. Event-based frame inter- polation with ad-hoc deblurring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18043–18052, 2023

  40. [41]

    Restoring images in adverse weather conditions via histogram transformer

    Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, and Xiaochun Cao. Restoring images in adverse weather conditions via histogram transformer. In European Conference on Computer Vision (ECCV), pages 111–129. Springer, 2024

  41. [42]

    Motion aware event representation-driven image deblurring

    Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aip- ing Liu, and Zheng-Jun Zha. Motion aware event representation-driven image deblurring. InEuro- pean Conference on Computer Vision, pages 418–

  42. [43]

    Sparse mlp for image recognition: Is self-attention really necessary? InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 2344–2351, 2022

    Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, and Wenjun Zeng. Sparse mlp for image recognition: Is self-attention really necessary? InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 2344–2351, 2022

  43. [44]

    Scale-recurrent network for deep image deblurring

    Tao, X., Gao, H., Shen, X., Wang, J., Jia, and J. Scale-recurrent network for deep image deblurring. 2018

  44. [45]

    Banet: Blur-aware attention networks for dy- namic scene deblurring

    Tsai, F.J., Peng, Y .T., Lin, Y .Y ., Tsai, C.C., Lin, and C.W. Banet: Blur-aware attention networks for dy- namic scene deblurring. page arXiv:2101.07518, 2021. 9 APREPRINT- MAY11, 2026

  45. [46]

    Event enhanced high-quality image re- covery

    Bishan Wang, Jingwei He, Lei Yu, Gui-Song Xia, and Wen Yang. Event enhanced high-quality image re- covery. InComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 155–171. Springer, 2020

  46. [47]

    Nformer: Robust person re- identification with neighbor transformer

    Haochen Wang, Jiayi Shen, Yongtuo Liu, Yan Gao, and Efstratios Gavves. Nformer: Robust person re- identification with neighbor transformer. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7297–7307, 2022

  47. [48]

    Kvt: k-nn attention for boosting vision transformers

    Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shun- ing Chang, Hao Li, and Rong Jin. Kvt: k-nn attention for boosting vision transformers. InEuropean confer- ence on computer vision, pages 285–302. Springer, 2022

  48. [49]

    Uformer: A general u-shaped transformer for image restoration

    Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 17683–17693, 2022

  49. [50]

    Event-based video reconstruction using transformer

    Wenming Weng, Yueyi Zhang, and Zhiwei Xiong. Event-based video reconstruction using transformer. ICCV, page 2563–2572, 2021

  50. [51]

    Hdr imaging for dynamic scenes with events.arXiv preprint arXiv:2404.03210, 2024

    Li Xiaopeng, Zeng Zhaoyuan, Fan Cien, Zhao Chen, Deng Lei, and Yu Lei. Hdr imaging for dynamic scenes with events.arXiv preprint arXiv:2404.03210, 2024

  51. [52]

    Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation

    Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation. page arXiv:2401.13560, 2024

  52. [53]

    Unnatural l0 sparse representation for natural image deblurring

    Xu, L., Zheng, S., Jia, and J. Unnatural l0 sparse representation for natural image deblurring. 2013

  53. [54]

    Event-based motion deblurring with modality-aware decomposition and recomposi- tion

    Wen Yang, Jinjian Wu, Leida Li, Weisheng Dong, and Guangming Shi. Event-based motion deblurring with modality-aware decomposition and recomposi- tion. InProceedings of the 31st ACM International Conference on Multimedia, pages 8327–8335, 2023

  54. [55]

    Learning event guided high dynamic range video reconstruction

    Yixin Yang, Jin Han, Jinxiu Liang, Imari Sato, and Boxin Shi. Learning event guided high dynamic range video reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13924–13934, 2023

  55. [56]

    Learning scale-aware spatio-temporal implicit representation for event-based motion de- blurring

    Wei Yu, Jianing Li, Shengping Zhang, and Xi- angyang Ji. Learning scale-aware spatio-temporal implicit representation for event-based motion de- blurring. InForty-first International Conference on Machine Learning, 2024

  56. [57]

    Multi-stage progres- sive image restoration

    Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, and L. Multi-stage progres- sive image restoration. 2021

  57. [58]

    Restormer: Efficient transformer for high- resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high- resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pages 5728–5739, 2022

  58. [59]

    Accurate image restora- tion with attention retractable transformer.arXiv preprint arXiv:2210.01427, 2022

    Jiale Zhang, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, and Xin Yuan. Accurate im- age restoration with attention retractable transformer. arXiv preprint arXiv:2210.01427, 2022

  59. [60]

    Explicit sparse transformer: Concentrated attention through explicit selection.arXiv preprint arXiv:1912.11637, 2019

    Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xu- ancheng Ren, Qi Su, and Xu Sun. Explicit sparse transformer: Concentrated attention through explicit selection.arXiv preprint arXiv:1912.11637, 2019

  60. [61]

    Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

    Lianghui Zhu, Bencheng Liao, Qian Zhang, Xin- long Wang, Wenyu Liu, and Xinggang Wang. Vi- sion mamba: Efficient visual representation learn- ing with bidirectional state space model. page arXiv:2401.09417, 2024

  61. [62]

    Learning weather-general and weather-specific fea- tures for image restoration under multiple adverse weather conditions

    Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, and Xiaowei Hu. Learning weather-general and weather-specific fea- tures for image restoration under multiple adverse weather conditions. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 10