arxiv: 2605.08073 · v1 · submitted 2026-05-08 · 💻 cs.CV · cs.AI

Recognition: no theorem link

EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction

Wei Yu , Yunhang Qian

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:00 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords event-based visionimage reconstructionstate space modelsmotion deblurringderainingHDR enhancementsparse attentionefficient neural networks

0 comments

The pith

A state space model reconstructs images from event streams with linear complexity and better accuracy than CNNs or transformers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EmambaIR to adapt state space models for event-guided image reconstruction. It targets the inability of CNNs to capture global correlations and the quadratic cost of vision transformers by adding two components that keep overall complexity linear. The first performs sparse pixel-level attention to fuse event and image data, while the second adds gating to improve temporal modeling inside standard state space layers. Experiments across motion deblurring, deraining, and HDR tasks on six datasets show gains in quality alongside lower memory and compute use. A sympathetic reader would care because event cameras supply fast, high-dynamic-range data whose reconstruction has previously been too slow or memory-heavy for practical high-resolution use.

Core claim

EmambaIR combines a cross-modal Top-k Sparse Attention Module that performs efficient pixel-level top-k attention to guide event-image fusion with a Gated State-Space Module that adds a nonlinear gated unit to vanilla linear state space models, thereby capturing global contextual dependencies and temporal information from sparse event streams without quadratic cost. This architecture is applied to three reconstruction tasks and yields higher-quality outputs than prior CNN- and ViT-based methods while consuming substantially less memory and computation on six datasets.

What carries the argument

The cross-modal Top-k Sparse Attention Module for sparse cross-modal fusion paired with the Gated State-Space Module that enhances temporal representation inside linear-complexity state space layers.

If this is right

It delivers higher reconstruction quality than state-of-the-art methods on motion deblurring, deraining, and HDR enhancement across six datasets.
It achieves substantial reductions in memory consumption and computational cost relative to vision transformers.
Its linear complexity supports use in high-resolution scenarios where prior transformer approaches become prohibitive.
It effectively processes the spatially sparse and temporally continuous nature of event streams for image restoration.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparse-attention-plus-gating pattern could be tested on other event-based tasks such as tracking or segmentation.
If the linear scaling holds beyond the reported resolutions, the model might run in real time on edge hardware for autonomous systems.
The gated enhancement technique could be applied to other state-space-model variants in computer vision to improve temporal modeling.

Load-bearing premise

That the top-k sparse attention and gated state-space additions together model global dependencies and event-image interactions more effectively than CNNs or vision transformers while preserving linear scaling.

What would settle it

Direct head-to-head runs on the six datasets for motion deblurring, deraining, and HDR enhancement that show EmambaIR using more memory or time than current best methods or producing lower-quality reconstructions would disprove the central efficiency and performance claims.

Figures

Figures reproduced from arXiv: 2605.08073 by Wei Yu, Yunhang Qian.

**Figure 2.** Figure 2: Overall architecture of (a) our EmambaIR for event-guided image reconstruction, which consists of a [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Illustration of the proposed nonlinear gated unit. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison results of different image deblurring methods on the GoPro dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative comparison results of different image [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 9.** Figure 9: Ablation studies on the impact of varying [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗

read the original abstract

Recent event-based image reconstruction methods predominantly rely on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) to process complementary event information. However, these architectures face fundamental limitations: CNNs often fail to capture global feature correlations, whereas ViTs incur quadratic computational complexity (e.g., $O(n^2)$), hindering their application in high-resolution scenarios. To address these bottlenecks, we introduce EmambaIR, an Efficient visual State Space Model designed for image reconstruction using spatially sparse and temporally continuous event streams. Our framework introduces two key components: the cross-modal Top-k Sparse Attention Module (TSAM) and the Gated State-Space Module (GSSM). TSAM efficiently performs pixel-level top-k sparse attention to guide cross-modal interactions, yielding rich yet sparse fusion features. Subsequently, GSSM utilizes a nonlinear gated unit to enhance the temporal representation of vanilla linear-complexity ($O(n)$) SSMs, effectively capturing global contextual dependencies without the typical computational overhead. Extensive experiments on six datasets across three diverse image reconstruction tasks - motion deblurring, deraining, and High Dynamic Range (HDR) enhancement - demonstrate that EmambaIR significantly outperforms state-of-the-art methods while offering substantial reductions in memory consumption and computational cost. The source code and data are publicly available at: https://github.com/YunhangWickert/EmambaIR

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EmambaIR pairs top-k sparse attention with a gated SSM to handle event-guided reconstruction more efficiently than CNNs or ViTs, with experiments showing gains on standard tasks.

read the letter

The main thing to know is that this paper builds EmambaIR around a state space model for event-guided image reconstruction. It adds a cross-modal top-k sparse attention module (TSAM) for fusing event and image data sparsely at the pixel level, plus a gated state-space module (GSSM) that layers a nonlinear gate onto the usual linear-complexity SSM to pick up global context without quadratic blowup. That combination targets the local bias of CNNs and the compute cost of ViTs while staying O(n) overall.

Referee Report

0 major / 3 minor

Summary. The paper introduces EmambaIR, an efficient visual state space model for event-guided image reconstruction from sparse event streams. It proposes two modules: the cross-modal Top-k Sparse Attention Module (TSAM) for efficient pixel-level sparse attention to enable cross-modal fusion, and the Gated State-Space Module (GSSM) that augments linear-complexity SSMs with a nonlinear gated unit for improved temporal modeling. The framework is evaluated on motion deblurring, deraining, and HDR enhancement tasks across six datasets, claiming superior performance over CNN- and ViT-based SOTA methods alongside substantial reductions in memory and compute.

Significance. If the empirical results and efficiency claims hold, the work offers a promising direction for scalable event-based vision by leveraging SSMs to achieve global context with linear complexity, potentially enabling high-resolution applications where ViTs are prohibitive. Public release of code and data is a positive factor for reproducibility.

minor comments (3)

[Abstract] Abstract: The claim of significant outperformance and efficiency gains is stated without any numerical metrics, error bars, or dataset identifiers, which reduces immediate assessability even though the full manuscript presumably contains these details in the experiments section.
[§4] §4 (Experiments): Confirm that all reported comparisons include consistent evaluation protocols (e.g., same input resolutions, event representations) across the six datasets to support the cross-task generalization claim.
[Figure 3] Figure 3 or equivalent architecture diagram: Clarify the exact integration point of TSAM outputs into GSSM to avoid ambiguity in how cross-modal features propagate through the state-space layers.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of EmambaIR and the recommendation for minor revision. We appreciate the recognition that the approach offers a promising direction for scalable event-based vision by achieving global context with linear complexity.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript introduces EmambaIR as an architectural framework combining TSAM (cross-modal Top-k Sparse Attention) and GSSM (Gated State-Space Module) to process event streams for image reconstruction. Its central claims rest on empirical results across six datasets for deblurring, deraining, and HDR tasks, plus the standard linear-complexity property of SSMs. No derivation chain, equation, or performance prediction reduces by construction to a fitted parameter, self-definition, or self-citation; the modules are defined independently and evaluated externally. This is the expected non-finding for an empirical architecture paper whose results are not forced by internal re-labeling of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no specific free parameters, axioms, or invented entities can be audited. The central claim rests on standard deep-learning assumptions about generalization from the reported experiments and the linear scaling property of SSMs.

pith-pipeline@v0.9.0 · 5541 in / 1167 out tokens · 50502 ms · 2026-05-11T02:00:21.769942+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 2 internal anchors

[1]

2-d ssm: A general spatial layer for visual transformers

Ethan Baron, Itamar Zimerman, and Lior Wolf. 2-d ssm: A general spatial layer for visual transformers. arXiv preprint arXiv:2306.06635, 2023

work page arXiv 2023
[2]

Retinexformer: One-stage retinex-based transformer for low-light im- age enhancement

Yuanhao Cai, Hao Bian, Jing Lin, Haoqian Wang, Radu Timofte, and Yulun Zhang. Retinexformer: One-stage retinex-based transformer for low-light im- age enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12504–12513, 2023

work page 2023
[3]

Hinet: Half instance normalization network for image restoration

Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Chengpeng Chen. Hinet: Half instance normalization network for image restoration. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 182–192, 2021

work page 2021
[4]

Learning a sparse transformer network for effective image deraining

Xiang Chen, Hao Li, Mingqiang Li, and Jinshan Pan. Learning a sparse transformer network for effective image deraining. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pages 5896–5905, 2023

work page 2023
[5]

Rethinking coarse-to-fine approach in single image deblurring

Cho, S.J., Ji, S.W., Hong, J.P., Jung, S.W., Ko, and S.J. Rethinking coarse-to-fine approach in single image deblurring. 2021

work page 2021
[6]

Reciprocal attention mixing transformer for lightweight image restora- tion

Haram Choi, Cheolwoong Na, Jihyeon Oh, Seung- jae Lee, Jinseop Kim, Subeen Choe, Jeongmin Lee, Taehoon Kim, and Jihoon Yang. Reciprocal attention mixing transformer for lightweight image restora- tion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5992–6002, 2024

work page 2024
[7]

Nafssr: Stereo image super-resolution using nafnet

Xiaojie Chu, Liangyu Chen, and Wenqing Yu. Nafssr: Stereo image super-resolution using nafnet. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1239–1248, 2022

work page 2022
[8]

Dancing in the dark: A benchmark towards general low-light video enhancement

Huiyuan Fu, Wenkai Zheng, Xicong Wang, Jiaxuan Wang, Heng Zhang, and Huadong Ma. Dancing in the dark: A benchmark towards general low-light video enhancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12877–12886, 2023

work page 2023
[9]

Learning enriched features via selective state spaces model for efficient image deblurring

Hu Gao, Bowen Ma, Ying Zhang, Jingfan Yang, Jing Yang, and Depeng Dang. Learning enriched features via selective state spaces model for efficient image deblurring. InProceedings of the 32nd ACM Inter- national Conference on Multimedia, pages 710–718, 2024

work page 2024
[10]

Efficient frequency-domain image deraining with contrastive regularization

Ning Gao, Xingyu Jiang, Xiuhui Zhang, and Yue Deng. Efficient frequency-domain image deraining with contrastive regularization. InEuropean Confer- ence on Computer Vision (ECCV). Springer, 2024

work page 2024
[12]

Efficiently Modeling Long Sequences with Structured State Spaces

Albert Gu, Karan Goel, and Christopher Ré. Ef- ficiently modeling long sequences with structured state spaces. page arXiv:2111.00396, 2021

work page internal anchor Pith review arXiv 2021
[13]

Mambair: A simple baseline for image restoration with state-space model

Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, and Shu-Tao Xia. Mambair: A simple baseline for image restoration with state-space model. InEuropean Conference on Computer Vision, pages 222–241. Springer, 2025

work page 2025
[14]

Multi-scale representation learning for image restoration with state-space model,

Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, and Lu Wang. Multi-scale representation learning for image restoration with state-space model.arXiv preprint arXiv:2408.10145, 2024

work page arXiv 2024
[15]

Long movie clip classification with state-space video mod- els

Md Mohaiminul Islam and Gedas Bertasius. Long movie clip classification with state-space video mod- els. InEuropean Conference on Computer Vision, pages 87–104. Springer, 2022

work page 2022
[16]

Noise-trained deep neural networks effectively pre- dict human vision and its neural responses to chal- lenging images.PLoS Biology, 19(12):e3001418, 2021

Hojin Jang, Devin McCormack, and Frank Tong. Noise-trained deep neural networks effectively pre- dict human vision and its neural responses to chal- lenging images.PLoS Biology, 19(12):e3001418, 2021

work page 2021
[17]

Learning event-based motion deblurring

Jiang, Z., Zhang, Y ., Zou, D., Ren, J., Lv, J., Liu, and Y . Learning event-based motion deblurring. 2020

work page 2020
[18]

Frequency-aware event-based video deblurring for real-world motion blur

Taewoo Kim, Hoonhee Cho, and Kuk-Jin Yoon. Frequency-aware event-based video deblurring for real-world motion blur. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 24966–24976, 2024

work page 2024
[19]

Adam: A method for stochastic optimization.ICLR, 2015

Kingma, D.P., Ba, and J. Adam: A method for stochastic optimization.ICLR, 2015

work page 2015
[20]

Blind de- convolution using alternating maximum a posteriori estimation with heavy-tailed priors

Kotera, J., Sroubek, F., Milanfar, and P. Blind de- convolution using alternating maximum a posteriori estimation with heavy-tailed priors. 2013. 8 APREPRINT- MAY11, 2026

work page 2013
[21]

Blind deconvo- lution using a normalized sparsity measure

Krishnan, D., Tay, T., Fergus, and R. Blind deconvo- lution using a normalized sparsity measure. 2011

work page 2011
[22]

Knn local attention for image restora- tion

Hunsang Lee, Hyesong Choi, Kwanghoon Sohn, and Dongbo Min. Knn local attention for image restora- tion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2139–2149, 2022

work page 2022
[23]

Towards robust event-guided low- light image enhancement: A large-scale real-world event-image dataset and novel approach

Guoqiang Liang, Kanghao Chen, Hangyu Li, Yunfan Lu, and Lin Wang. Towards robust event-guided low- light image enhancement: A large-scale real-world event-image dataset and novel approach. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23–33, 2024

work page 2024
[24]

Swinir: Image restoration using swin transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. Swinir: Image restoration using swin transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 1833–1844, 2021

work page 2021
[25]

Learning event-driven video deblurring and interpolation

Lin, S., Zhang, J., Pan, J., Jiang, Z., Zou, D., Wang, Y ., Chen, J., Ren, and J. Learning event-driven video deblurring and interpolation. 2020

work page 2020
[26]

Pay attention to mlps.Advances in neural informa- tion processing systems, 34:9204–9215, 2021

Hanxiao Liu, Zihang Dai, David So, and Quoc V Le. Pay attention to mlps.Advances in neural informa- tion processing systems, 34:9204–9215, 2021

work page 2021
[27]

arXiv preprint arXiv:2401.10166 , year=

Yue Liu, Yunjie Tian, Yuzhong Zhao, Hongtian Yu, Lingxi Xie, Yaowei Wang, Qixiang Ye, and Yun- fan Liu. Vmamba: Visual state space model. page arXiv:2401.10166, 2024

work page arXiv 2024
[28]

Event camera demosaicing via swin trans- former and pixel-focus loss

Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, and Hui Xiong. Event camera demosaicing via swin trans- former and pixel-focus loss. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 1095–1105, 2024

work page 2024
[29]

U-mamba: Enhancing long-range dependency for biomedical image segmentation

Jun Ma, Feifei Li, and Bo Wang. U-mamba: En- hancing long-range dependency for biomedical im- age segmentation.arXiv preprint arXiv:2401.04722, 2024

work page arXiv 2024
[30]

Multi-bracket high dynamic range imaging with event cameras

Nico Messikommer, Stamatios Georgoulis, Daniel Gehrig, Stepan Tulyakov, Julius Erbach, Alfredo Bochicchio, Yuanyou Li, and Davide Scaramuzza. Multi-bracket high dynamic range imaging with event cameras. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 547–557, 2022

work page 2022
[31]

Deep multi- scale convolutional neural network for dynamic scene deblurring.CVPR, 2017

Nah, S., Hyun Kim, T., Mu Lee, and K. Deep multi- scale convolutional neural network for dynamic scene deblurring.CVPR, 2017

work page 2017
[32]

On the integration of self-attention and convolution

Xuran Pan, Chunjiang Ge, Rui Lu, Shiji Song, Guanfu Chen, Zeyi Huang, and Gao Huang. On the integration of self-attention and convolution. In Proceedings of the IEEE/CVF conference on com- puter vision and pattern recognition, pages 815–825, 2022

work page 2022
[33]

arXiv preprint arXiv:2403.13600 (2024)

Yanyuan Qiao, Zheng Yu, Longteng Guo, Sihan Chen, Zijia Zhao, Mingzhen Sun, Qi Wu, and Jing Liu. Vl-mamba: Exploring state space models for multimodal learning. page arXiv:2403.13600, 2024

work page arXiv 2024
[34]

Esim: an open event camera simulator.CoLR, 2018

Rebecq, H., Gehrig, D., Scaramuzza, and D. Esim: an open event camera simulator.CoLR, 2018

work page 2018
[35]

Blurry video frame interpolation

Wang Shen, Wenbo Bao, Guangtao Zhai, Li Chen, Xiongkuo Min, and Zhiyong Gao. Blurry video frame interpolation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 5114–5123, 2020

work page 2020
[36]

Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024

Yuan Shi, Bin Xia, Xiaoyu Jin, Xing Wang, Tianyu Zhao, Xin Xia, Xuefeng Xiao, and Wenming Yang. Vmambair: Visual state space model for image restoration.arXiv preprint arXiv:2403.11423, 2024

work page arXiv 2024
[37]

Reducing the sim-to-real gap for event cam- eras.ECCV, 2020

Stoffregen, T., Scheerlinck, C., Scaramuzza, D., Drummond, T., Barnes, N., Kleeman, L., Mahony, and R. Reducing the sim-to-real gap for event cam- eras.ECCV, 2020

work page 2020
[38]

Spatially-attentive patch-hierarchical network for adaptive motion deblurring

Suin, M., Purohit, K., Rajagopalan, and A.N. Spatially-attentive patch-hierarchical network for adaptive motion deblurring. 2020

work page 2020
[39]

Event-based fusion for motion deblur- ring with cross-modal attention

Lei Sun, Christos Sakaridis, Jingyun Liang, Qi Jiang, Kailun Yang, Peng Sun, Yaozu Ye, Kaiwei Wang, and Luc Van Gool. Event-based fusion for motion deblur- ring with cross-modal attention. InEuropean confer- ence on computer vision, pages 412–428. Springer, 2022

work page 2022
[40]

Event-based frame inter- polation with ad-hoc deblurring

Lei Sun, Christos Sakaridis, Jingyun Liang, Peng Sun, Jiezhang Cao, Kai Zhang, Qi Jiang, Kaiwei Wang, and Luc Van Gool. Event-based frame inter- polation with ad-hoc deblurring. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18043–18052, 2023

work page 2023
[41]

Restoring images in adverse weather conditions via histogram transformer

Shangquan Sun, Wenqi Ren, Xinwei Gao, Rui Wang, and Xiaochun Cao. Restoring images in adverse weather conditions via histogram transformer. In European Conference on Computer Vision (ECCV), pages 111–129. Springer, 2024

work page 2024
[42]

Motion aware event representation-driven image deblurring

Zhijing Sun, Xueyang Fu, Longzhuo Huang, Aip- ing Liu, and Zheng-Jun Zha. Motion aware event representation-driven image deblurring. InEuro- pean Conference on Computer Vision, pages 418–

work page
[43]

Sparse mlp for image recognition: Is self-attention really necessary? InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 2344–2351, 2022

Chuanxin Tang, Yucheng Zhao, Guangting Wang, Chong Luo, Wenxuan Xie, and Wenjun Zeng. Sparse mlp for image recognition: Is self-attention really necessary? InProceedings of the AAAI conference on artificial intelligence, volume 36, pages 2344–2351, 2022

work page 2022
[44]

Scale-recurrent network for deep image deblurring

Tao, X., Gao, H., Shen, X., Wang, J., Jia, and J. Scale-recurrent network for deep image deblurring. 2018

work page 2018
[45]

Banet: Blur-aware attention networks for dy- namic scene deblurring

Tsai, F.J., Peng, Y .T., Lin, Y .Y ., Tsai, C.C., Lin, and C.W. Banet: Blur-aware attention networks for dy- namic scene deblurring. page arXiv:2101.07518, 2021. 9 APREPRINT- MAY11, 2026

work page arXiv 2021
[46]

Event enhanced high-quality image re- covery

Bishan Wang, Jingwei He, Lei Yu, Gui-Song Xia, and Wen Yang. Event enhanced high-quality image re- covery. InComputer Vision–ECCV 2020: 16th Euro- pean Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 155–171. Springer, 2020

work page 2020
[47]

Nformer: Robust person re- identification with neighbor transformer

Haochen Wang, Jiayi Shen, Yongtuo Liu, Yan Gao, and Efstratios Gavves. Nformer: Robust person re- identification with neighbor transformer. InProceed- ings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7297–7307, 2022

work page 2022
[48]

Kvt: k-nn attention for boosting vision transformers

Pichao Wang, Xue Wang, Fan Wang, Ming Lin, Shun- ing Chang, Hao Li, and Rong Jin. Kvt: k-nn attention for boosting vision transformers. InEuropean confer- ence on computer vision, pages 285–302. Springer, 2022

work page 2022
[49]

Uformer: A general u-shaped transformer for image restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao, Wengang Zhou, Jianzhuang Liu, and Houqiang Li. Uformer: A general u-shaped transformer for image restoration. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 17683–17693, 2022

work page 2022
[50]

Event-based video reconstruction using transformer

Wenming Weng, Yueyi Zhang, and Zhiwei Xiong. Event-based video reconstruction using transformer. ICCV, page 2563–2572, 2021

work page 2021
[51]

Hdr imaging for dynamic scenes with events.arXiv preprint arXiv:2404.03210, 2024

Li Xiaopeng, Zeng Zhaoyuan, Fan Cien, Zhao Chen, Deng Lei, and Yu Lei. Hdr imaging for dynamic scenes with events.arXiv preprint arXiv:2404.03210, 2024

work page arXiv 2024
[52]

Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation

Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential model- ing mamba for 3d medical image segmentation. page arXiv:2401.13560, 2024

work page arXiv 2024
[53]

Unnatural l0 sparse representation for natural image deblurring

Xu, L., Zheng, S., Jia, and J. Unnatural l0 sparse representation for natural image deblurring. 2013

work page 2013
[54]

Event-based motion deblurring with modality-aware decomposition and recomposi- tion

Wen Yang, Jinjian Wu, Leida Li, Weisheng Dong, and Guangming Shi. Event-based motion deblurring with modality-aware decomposition and recomposi- tion. InProceedings of the 31st ACM International Conference on Multimedia, pages 8327–8335, 2023

work page 2023
[55]

Learning event guided high dynamic range video reconstruction

Yixin Yang, Jin Han, Jinxiu Liang, Imari Sato, and Boxin Shi. Learning event guided high dynamic range video reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 13924–13934, 2023

work page 2023
[56]

Learning scale-aware spatio-temporal implicit representation for event-based motion de- blurring

Wei Yu, Jianing Li, Shengping Zhang, and Xi- angyang Ji. Learning scale-aware spatio-temporal implicit representation for event-based motion de- blurring. InForty-first International Conference on Machine Learning, 2024

work page 2024
[57]

Multi-stage progres- sive image restoration

Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, and L. Multi-stage progres- sive image restoration. 2021

work page 2021
[58]

Restormer: Efficient transformer for high- resolution image restoration

Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high- resolution image restoration. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pages 5728–5739, 2022

work page 2022
[59]

Accurate image restora- tion with attention retractable transformer.arXiv preprint arXiv:2210.01427, 2022

Jiale Zhang, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, and Xin Yuan. Accurate im- age restoration with attention retractable transformer. arXiv preprint arXiv:2210.01427, 2022

work page arXiv 2022
[60]

Explicit sparse transformer: Concentrated attention through explicit selection.arXiv preprint arXiv:1912.11637, 2019

Guangxiang Zhao, Junyang Lin, Zhiyuan Zhang, Xu- ancheng Ren, Qi Su, and Xu Sun. Explicit sparse transformer: Concentrated attention through explicit selection.arXiv preprint arXiv:1912.11637, 2019

work page arXiv 1912
[61]

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Lianghui Zhu, Bencheng Liao, Qian Zhang, Xin- long Wang, Wenyu Liu, and Xinggang Wang. Vi- sion mamba: Efficient visual representation learn- ing with bidirectional state space model. page arXiv:2401.09417, 2024

work page internal anchor Pith review arXiv 2024
[62]

Learning weather-general and weather-specific fea- tures for image restoration under multiple adverse weather conditions

Yurui Zhu, Tianyu Wang, Xueyang Fu, Xuanyu Yang, Xin Guo, Jifeng Dai, Yu Qiao, and Xiaowei Hu. Learning weather-general and weather-specific fea- tures for image restoration under multiple adverse weather conditions. InProc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 10

work page 2023